{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 上周我们的数据统计结果如下"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "| Column id        | Example value                                    | Different Value types |\n",
    "| ---------------- | ------------------------------------------------ | --------------------- |\n",
    "| id               | 5.21159374e+11                                   | 40428967              |\n",
    "| click            | 0,1                                              | 2                     |\n",
    "| hour             | 14102100 14102101                                | 240                   |\n",
    "| C1               | 1001 1002 1005 1007 1008 1010 1012               | 7                     |\n",
    "| banner_pos       | 0 1 2 3 4 5 7                                    | 8                     |\n",
    "| site_id          | '000aa1a4' ' ... 'fffe8e1c'                      | 4737                  |\n",
    "| site_domain      | '004d30ed' ... 'ffdec903'                        | 7745                  |\n",
    "| site_category    | '0569f928' …'f66779e6'                           | 26                    |\n",
    "| app_id           | '000d6291' ... 'ffef3b38'                        | 8552                  |\n",
    "| app_domain       | 'fea0d84a'…'ff6630e0'                            | 559                   |\n",
    "| app_category     | 'd1327cf5' …'fc6fa53d'                           | 36                    |\n",
    "| device_id        | '00000919' ... 'ffffde2c'                        | 2686408               |\n",
    "| device_ip        | '00000911' ... 'fffff971'                        | 6729486               |\n",
    "| device_model     | '000ab70c' ... 'ffe72be2'                        | 8251                  |\n",
    "| device_type­­­   | 0 1 2 4 5                                        | 5                     |\n",
    "| device_conn_type | 0 2 3 5                                          | 4                     |\n",
    "| C14              | 375 ... 24052                                    | 2626                  |\n",
    "| C15              | 120    216  300  320    480  728  768 1024       | 8                     |\n",
    "| C16              | 20     36   50   90    250  320  480    768 1024 | 9                     |\n",
    "| C17              | 112…2758                                         | 435                   |\n",
    "| C18              | 0 1 2 3                                          | 4                     |\n",
    "| C19              | 33…1959                                          | 68                    |\n",
    "| C20              | -1 100000 100001…100248                          | 172                   |\n",
    "| C21              | 1 … 219                                          | 60                    |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "合上述综合分析，筛选出以下特征作为训练特征：\n",
    "\n",
    "|      | 筛选出来的特征   | 数值类型 | 独一数   | 特征处理                                |\n",
    "| ---- | ---------------- | -------- | -------- | --------------------------------------- |\n",
    "| 1    | C1               | 数值型   | 7        | 类别化                                  |\n",
    "| 2    | banner_pos       | 数值型   | 7        | 类别化                                  |\n",
    "| 3    | device_type      | 数值型   | 5        | 不作处理                                |\n",
    "| 4    | device_conn_type | 数值型   | 4        | 不作处理                                |\n",
    "| 5    | C17              | 数值型   | 435      | 不作处理                                |\n",
    "| 6    | C15              | 数值型   | 8        | C15XC16合并为一个特征                   |\n",
    "| 7    | C16              | 数值型   | 9        | C15XC16合并为一个特征                   |\n",
    "| 8    | C18              | 数值型   | 4        | 不作处理                                |\n",
    "| 9    | C19              | 数值型   | 68       | 不作处理                                |\n",
    "| 10   | C20              | 数值型   | 172      | 不作处理                                |\n",
    "| 11   | C21              | 数值型   | 60       | 不作处理                                |\n",
    "| 12   | hour             | 时间型   | 40428967 | 分段汇总，类别化或转化为连续变量        |\n",
    "| 13   | site_category    | 类别型   | 26       | labelencoder 或者hash                   |\n",
    "| 14   | app_domain       | 类别型   | 559      | labelencoder 或者hash                   |\n",
    "| 15   | app_category     | 类别型   | 36       | labelencoder 或者hash                   |\n",
    "| 16   | device_model     | 类别型   | 8251     | labelencoder 或者hash                   |\n",
    "| 17   | device_id        | 类别型   | 2686408  | device_id+device_ip，hash后作为代表用户 |\n",
    "| 18   | device_ip        | 类别型   | 6729486  | device_id+device_ip，hash后作为代表用户 |\n",
    "\n",
    "\n",
    "\n",
    "而id，app_id, site_id, site_domain，C14将直接舍弃。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 想将点击率针对于每一个特征的后验概率加入到特征中"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "path = 'data/train_info/'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 特征编码"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "#保留的特征：\n",
    "feature_list = ['hour', 'C1', 'banner_pos', 'site_id', 'site_domain',\n",
    "       'site_category', 'app_id', 'app_domain', 'app_category', 'device_id',\n",
    "       'device_ip', 'device_model', 'device_type', 'device_conn_type', 'C14',\n",
    "       'C15', 'C16', 'C17', 'C18', 'C19', 'C20', 'C21']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_file = pd.read_csv('data/train_1.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>click</th>\n",
       "      <th>hour</th>\n",
       "      <th>C1</th>\n",
       "      <th>banner_pos</th>\n",
       "      <th>site_id</th>\n",
       "      <th>site_domain</th>\n",
       "      <th>site_category</th>\n",
       "      <th>app_id</th>\n",
       "      <th>app_domain</th>\n",
       "      <th>...</th>\n",
       "      <th>device_type</th>\n",
       "      <th>device_conn_type</th>\n",
       "      <th>C14</th>\n",
       "      <th>C15</th>\n",
       "      <th>C16</th>\n",
       "      <th>C17</th>\n",
       "      <th>C18</th>\n",
       "      <th>C19</th>\n",
       "      <th>C20</th>\n",
       "      <th>C21</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.005585e+19</td>\n",
       "      <td>0</td>\n",
       "      <td>14102100</td>\n",
       "      <td>1005</td>\n",
       "      <td>0</td>\n",
       "      <td>85f751fd</td>\n",
       "      <td>c4e18dd6</td>\n",
       "      <td>50e219e0</td>\n",
       "      <td>0acbeaa3</td>\n",
       "      <td>45a51db4</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>20596</td>\n",
       "      <td>320</td>\n",
       "      <td>50</td>\n",
       "      <td>2161</td>\n",
       "      <td>0</td>\n",
       "      <td>35</td>\n",
       "      <td>-1</td>\n",
       "      <td>157</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.010206e+19</td>\n",
       "      <td>0</td>\n",
       "      <td>14102100</td>\n",
       "      <td>1005</td>\n",
       "      <td>0</td>\n",
       "      <td>85f751fd</td>\n",
       "      <td>c4e18dd6</td>\n",
       "      <td>50e219e0</td>\n",
       "      <td>51cedd4e</td>\n",
       "      <td>aefc06bd</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>18648</td>\n",
       "      <td>320</td>\n",
       "      <td>50</td>\n",
       "      <td>1092</td>\n",
       "      <td>3</td>\n",
       "      <td>1835</td>\n",
       "      <td>100156</td>\n",
       "      <td>61</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.020168e+19</td>\n",
       "      <td>0</td>\n",
       "      <td>14102100</td>\n",
       "      <td>1005</td>\n",
       "      <td>0</td>\n",
       "      <td>1fbe01fe</td>\n",
       "      <td>f3845767</td>\n",
       "      <td>28905ebd</td>\n",
       "      <td>ecad2386</td>\n",
       "      <td>7801e8d9</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>15703</td>\n",
       "      <td>320</td>\n",
       "      <td>50</td>\n",
       "      <td>1722</td>\n",
       "      <td>0</td>\n",
       "      <td>35</td>\n",
       "      <td>-1</td>\n",
       "      <td>79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.025000e+19</td>\n",
       "      <td>1</td>\n",
       "      <td>14102100</td>\n",
       "      <td>1005</td>\n",
       "      <td>0</td>\n",
       "      <td>1fbe01fe</td>\n",
       "      <td>f3845767</td>\n",
       "      <td>28905ebd</td>\n",
       "      <td>ecad2386</td>\n",
       "      <td>7801e8d9</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>15702</td>\n",
       "      <td>320</td>\n",
       "      <td>50</td>\n",
       "      <td>1722</td>\n",
       "      <td>0</td>\n",
       "      <td>35</td>\n",
       "      <td>100084</td>\n",
       "      <td>79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.052369e+19</td>\n",
       "      <td>0</td>\n",
       "      <td>14102100</td>\n",
       "      <td>1005</td>\n",
       "      <td>0</td>\n",
       "      <td>85f751fd</td>\n",
       "      <td>c4e18dd6</td>\n",
       "      <td>50e219e0</td>\n",
       "      <td>0acbeaa3</td>\n",
       "      <td>45a51db4</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>20596</td>\n",
       "      <td>320</td>\n",
       "      <td>50</td>\n",
       "      <td>2161</td>\n",
       "      <td>0</td>\n",
       "      <td>35</td>\n",
       "      <td>100148</td>\n",
       "      <td>157</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 24 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             id  click      hour    C1  banner_pos   site_id site_domain  \\\n",
       "0  1.005585e+19      0  14102100  1005           0  85f751fd    c4e18dd6   \n",
       "1  1.010206e+19      0  14102100  1005           0  85f751fd    c4e18dd6   \n",
       "2  1.020168e+19      0  14102100  1005           0  1fbe01fe    f3845767   \n",
       "3  1.025000e+19      1  14102100  1005           0  1fbe01fe    f3845767   \n",
       "4  1.052369e+19      0  14102100  1005           0  85f751fd    c4e18dd6   \n",
       "\n",
       "  site_category    app_id app_domain ...  device_type device_conn_type    C14  \\\n",
       "0      50e219e0  0acbeaa3   45a51db4 ...            1                0  20596   \n",
       "1      50e219e0  51cedd4e   aefc06bd ...            1                0  18648   \n",
       "2      28905ebd  ecad2386   7801e8d9 ...            1                0  15703   \n",
       "3      28905ebd  ecad2386   7801e8d9 ...            1                2  15702   \n",
       "4      50e219e0  0acbeaa3   45a51db4 ...            1                0  20596   \n",
       "\n",
       "   C15  C16   C17  C18   C19     C20  C21  \n",
       "0  320   50  2161    0    35      -1  157  \n",
       "1  320   50  1092    3  1835  100156   61  \n",
       "2  320   50  1722    0    35      -1   79  \n",
       "3  320   50  1722    0    35  100084   79  \n",
       "4  320   50  2161    0    35  100148  157  \n",
       "\n",
       "[5 rows x 24 columns]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_file.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'np' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m--------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-5-e6ff77ab82b6>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      3\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mcolumn\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mcolumns\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      4\u001b[0m     \u001b[0mdata_single_col\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'data/test'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0musecols\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mcolumn\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m     \u001b[0munique_vals\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munique\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata_single_col\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      6\u001b[0m     \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0munique_vals\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mNameError\u001b[0m: name 'np' is not defined"
     ]
    }
   ],
   "source": [
    "test_file = pd.read_csv('data/test')\n",
    "columns = test_file.columns\n",
    "for column in columns:\n",
    "    data_single_col = pd.read_csv('data/test', usecols = [column])\n",
    "    unique_vals = np.unique(data_single_col)\n",
    "    print(unique_vals)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "hour_rate = pd.read_csv(path+'clickVShour.csv', usecols=['hour','avg(click)'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>hour</th>\n",
       "      <th>avg(click)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>14102100</td>\n",
       "      <td>0.174714</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>14102101</td>\n",
       "      <td>0.173695</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>14102102</td>\n",
       "      <td>0.150696</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>14102103</td>\n",
       "      <td>0.169791</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>14102104</td>\n",
       "      <td>0.151206</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       hour  avg(click)\n",
       "0  14102100    0.174714\n",
       "1  14102101    0.173695\n",
       "2  14102102    0.150696\n",
       "3  14102103    0.169791\n",
       "4  14102104    0.151206"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hour_rate.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "tran1 = pd.merge(train_file, hour_rate)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "tran1.rename(columns={'avg(click)':'hour_rate'}, inplace = True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>click</th>\n",
       "      <th>hour</th>\n",
       "      <th>C1</th>\n",
       "      <th>banner_pos</th>\n",
       "      <th>site_id</th>\n",
       "      <th>site_domain</th>\n",
       "      <th>site_category</th>\n",
       "      <th>app_id</th>\n",
       "      <th>app_domain</th>\n",
       "      <th>...</th>\n",
       "      <th>device_conn_type</th>\n",
       "      <th>C14</th>\n",
       "      <th>C15</th>\n",
       "      <th>C16</th>\n",
       "      <th>C17</th>\n",
       "      <th>C18</th>\n",
       "      <th>C19</th>\n",
       "      <th>C20</th>\n",
       "      <th>C21</th>\n",
       "      <th>hour_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>100980</th>\n",
       "      <td>9.766275e+18</td>\n",
       "      <td>1</td>\n",
       "      <td>14103023</td>\n",
       "      <td>1005</td>\n",
       "      <td>0</td>\n",
       "      <td>1fbe01fe</td>\n",
       "      <td>f3845767</td>\n",
       "      <td>28905ebd</td>\n",
       "      <td>ecad2386</td>\n",
       "      <td>7801e8d9</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>22676</td>\n",
       "      <td>320</td>\n",
       "      <td>50</td>\n",
       "      <td>2616</td>\n",
       "      <td>0</td>\n",
       "      <td>35</td>\n",
       "      <td>100084</td>\n",
       "      <td>51</td>\n",
       "      <td>0.168836</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100981</th>\n",
       "      <td>9.829227e+18</td>\n",
       "      <td>1</td>\n",
       "      <td>14103023</td>\n",
       "      <td>1005</td>\n",
       "      <td>0</td>\n",
       "      <td>1fbe01fe</td>\n",
       "      <td>f3845767</td>\n",
       "      <td>28905ebd</td>\n",
       "      <td>ecad2386</td>\n",
       "      <td>7801e8d9</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>22676</td>\n",
       "      <td>320</td>\n",
       "      <td>50</td>\n",
       "      <td>2616</td>\n",
       "      <td>0</td>\n",
       "      <td>35</td>\n",
       "      <td>-1</td>\n",
       "      <td>51</td>\n",
       "      <td>0.168836</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100982</th>\n",
       "      <td>9.877575e+18</td>\n",
       "      <td>0</td>\n",
       "      <td>14103023</td>\n",
       "      <td>1005</td>\n",
       "      <td>0</td>\n",
       "      <td>85f751fd</td>\n",
       "      <td>c4e18dd6</td>\n",
       "      <td>50e219e0</td>\n",
       "      <td>7e7baafa</td>\n",
       "      <td>2347f47a</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>23866</td>\n",
       "      <td>320</td>\n",
       "      <td>50</td>\n",
       "      <td>2736</td>\n",
       "      <td>0</td>\n",
       "      <td>33</td>\n",
       "      <td>100170</td>\n",
       "      <td>246</td>\n",
       "      <td>0.168836</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100983</th>\n",
       "      <td>9.911338e+18</td>\n",
       "      <td>0</td>\n",
       "      <td>14103023</td>\n",
       "      <td>1005</td>\n",
       "      <td>0</td>\n",
       "      <td>85f751fd</td>\n",
       "      <td>c4e18dd6</td>\n",
       "      <td>50e219e0</td>\n",
       "      <td>9c13b419</td>\n",
       "      <td>2347f47a</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>23161</td>\n",
       "      <td>320</td>\n",
       "      <td>50</td>\n",
       "      <td>2667</td>\n",
       "      <td>0</td>\n",
       "      <td>47</td>\n",
       "      <td>-1</td>\n",
       "      <td>221</td>\n",
       "      <td>0.168836</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100984</th>\n",
       "      <td>9.945913e+17</td>\n",
       "      <td>0</td>\n",
       "      <td>14103023</td>\n",
       "      <td>1005</td>\n",
       "      <td>0</td>\n",
       "      <td>85f751fd</td>\n",
       "      <td>c4e18dd6</td>\n",
       "      <td>50e219e0</td>\n",
       "      <td>54c5d545</td>\n",
       "      <td>2347f47a</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>17017</td>\n",
       "      <td>320</td>\n",
       "      <td>50</td>\n",
       "      <td>1873</td>\n",
       "      <td>3</td>\n",
       "      <td>39</td>\n",
       "      <td>-1</td>\n",
       "      <td>23</td>\n",
       "      <td>0.168836</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 25 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                  id  click      hour    C1  banner_pos   site_id site_domain  \\\n",
       "100980  9.766275e+18      1  14103023  1005           0  1fbe01fe    f3845767   \n",
       "100981  9.829227e+18      1  14103023  1005           0  1fbe01fe    f3845767   \n",
       "100982  9.877575e+18      0  14103023  1005           0  85f751fd    c4e18dd6   \n",
       "100983  9.911338e+18      0  14103023  1005           0  85f751fd    c4e18dd6   \n",
       "100984  9.945913e+17      0  14103023  1005           0  85f751fd    c4e18dd6   \n",
       "\n",
       "       site_category    app_id app_domain    ...     device_conn_type    C14  \\\n",
       "100980      28905ebd  ecad2386   7801e8d9    ...                    0  22676   \n",
       "100981      28905ebd  ecad2386   7801e8d9    ...                    0  22676   \n",
       "100982      50e219e0  7e7baafa   2347f47a    ...                    0  23866   \n",
       "100983      50e219e0  9c13b419   2347f47a    ...                    0  23161   \n",
       "100984      50e219e0  54c5d545   2347f47a    ...                    2  17017   \n",
       "\n",
       "        C15 C16   C17  C18  C19     C20  C21  hour_rate  \n",
       "100980  320  50  2616    0   35  100084   51   0.168836  \n",
       "100981  320  50  2616    0   35      -1   51   0.168836  \n",
       "100982  320  50  2736    0   33  100170  246   0.168836  \n",
       "100983  320  50  2667    0   47      -1  221   0.168836  \n",
       "100984  320  50  1873    3   39      -1   23   0.168836  \n",
       "\n",
       "[5 rows x 25 columns]"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tran1.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 将所有特征的后验概率加入到训练数据集中，并替换掉原来的特征"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       hour  avg(click)\n",
      "0  14102100    0.174714\n",
      "1  14102101    0.173695\n",
      "2  14102102    0.150696\n",
      "3  14102103    0.169791\n",
      "4  14102104    0.151206\n",
      "             id  click      hour    C1  banner_pos   site_id site_domain  \\\n",
      "0  1.005585e+19      0  14102100  1005           0  85f751fd    c4e18dd6   \n",
      "1  1.010206e+19      0  14102100  1005           0  85f751fd    c4e18dd6   \n",
      "2  1.020168e+19      0  14102100  1005           0  1fbe01fe    f3845767   \n",
      "3  1.025000e+19      1  14102100  1005           0  1fbe01fe    f3845767   \n",
      "4  1.052369e+19      0  14102100  1005           0  85f751fd    c4e18dd6   \n",
      "\n",
      "  site_category    app_id app_domain     ...     device_conn_type    C14  C15  \\\n",
      "0      50e219e0  0acbeaa3   45a51db4     ...                    0  20596  320   \n",
      "1      50e219e0  51cedd4e   aefc06bd     ...                    0  18648  320   \n",
      "2      28905ebd  ecad2386   7801e8d9     ...                    0  15703  320   \n",
      "3      28905ebd  ecad2386   7801e8d9     ...                    2  15702  320   \n",
      "4      50e219e0  0acbeaa3   45a51db4     ...                    0  20596  320   \n",
      "\n",
      "  C16   C17  C18   C19     C20  C21  avg(click)  \n",
      "0  50  2161    0    35      -1  157    0.174714  \n",
      "1  50  1092    3  1835  100156   61    0.174714  \n",
      "2  50  1722    0    35      -1   79    0.174714  \n",
      "3  50  1722    0    35  100084   79    0.174714  \n",
      "4  50  2161    0    35  100148  157    0.174714  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click      hour    C1  banner_pos   site_id site_domain  \\\n",
      "0  1.005585e+19      0  14102100  1005           0  85f751fd    c4e18dd6   \n",
      "1  1.010206e+19      0  14102100  1005           0  85f751fd    c4e18dd6   \n",
      "2  1.020168e+19      0  14102100  1005           0  1fbe01fe    f3845767   \n",
      "3  1.025000e+19      1  14102100  1005           0  1fbe01fe    f3845767   \n",
      "4  1.052369e+19      0  14102100  1005           0  85f751fd    c4e18dd6   \n",
      "\n",
      "  site_category    app_id app_domain    ...     device_conn_type    C14  C15  \\\n",
      "0      50e219e0  0acbeaa3   45a51db4    ...                    0  20596  320   \n",
      "1      50e219e0  51cedd4e   aefc06bd    ...                    0  18648  320   \n",
      "2      28905ebd  ecad2386   7801e8d9    ...                    0  15703  320   \n",
      "3      28905ebd  ecad2386   7801e8d9    ...                    2  15702  320   \n",
      "4      50e219e0  0acbeaa3   45a51db4    ...                    0  20596  320   \n",
      "\n",
      "  C16   C17  C18   C19     C20  C21  hour_rate  \n",
      "0  50  2161    0    35      -1  157   0.174714  \n",
      "1  50  1092    3  1835  100156   61   0.174714  \n",
      "2  50  1722    0    35      -1   79   0.174714  \n",
      "3  50  1722    0    35  100084   79   0.174714  \n",
      "4  50  2161    0    35  100148  157   0.174714  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click    C1  banner_pos   site_id site_domain site_category  \\\n",
      "0  1.005585e+19      0  1005           0  85f751fd    c4e18dd6      50e219e0   \n",
      "1  1.010206e+19      0  1005           0  85f751fd    c4e18dd6      50e219e0   \n",
      "2  1.020168e+19      0  1005           0  1fbe01fe    f3845767      28905ebd   \n",
      "3  1.025000e+19      1  1005           0  1fbe01fe    f3845767      28905ebd   \n",
      "4  1.052369e+19      0  1005           0  85f751fd    c4e18dd6      50e219e0   \n",
      "\n",
      "     app_id app_domain app_category    ...     device_conn_type    C14  C15  \\\n",
      "0  0acbeaa3   45a51db4     f95efa07    ...                    0  20596  320   \n",
      "1  51cedd4e   aefc06bd     0f2161f8    ...                    0  18648  320   \n",
      "2  ecad2386   7801e8d9     07d7df22    ...                    0  15703  320   \n",
      "3  ecad2386   7801e8d9     07d7df22    ...                    2  15702  320   \n",
      "4  0acbeaa3   45a51db4     f95efa07    ...                    0  20596  320   \n",
      "\n",
      "   C16   C17  C18   C19     C20  C21  hour_rate  \n",
      "0   50  2161    0    35      -1  157   0.174714  \n",
      "1   50  1092    3  1835  100156   61   0.174714  \n",
      "2   50  1722    0    35      -1   79   0.174714  \n",
      "3   50  1722    0    35  100084   79   0.174714  \n",
      "4   50  2161    0    35  100148  157   0.174714  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "     C1  avg(click)\n",
      "0  1001    0.033393\n",
      "1  1002    0.210731\n",
      "2  1005    0.169331\n",
      "3  1007    0.039429\n",
      "4  1008    0.121652\n",
      "             id  click    C1  banner_pos   site_id site_domain site_category  \\\n",
      "0  1.005585e+19      0  1005           0  85f751fd    c4e18dd6      50e219e0   \n",
      "1  1.010206e+19      0  1005           0  85f751fd    c4e18dd6      50e219e0   \n",
      "2  1.020168e+19      0  1005           0  1fbe01fe    f3845767      28905ebd   \n",
      "3  1.025000e+19      1  1005           0  1fbe01fe    f3845767      28905ebd   \n",
      "4  1.052369e+19      0  1005           0  85f751fd    c4e18dd6      50e219e0   \n",
      "\n",
      "     app_id app_domain app_category     ...        C14  C15 C16   C17  C18  \\\n",
      "0  0acbeaa3   45a51db4     f95efa07     ...      20596  320  50  2161    0   \n",
      "1  51cedd4e   aefc06bd     0f2161f8     ...      18648  320  50  1092    3   \n",
      "2  ecad2386   7801e8d9     07d7df22     ...      15703  320  50  1722    0   \n",
      "3  ecad2386   7801e8d9     07d7df22     ...      15702  320  50  1722    0   \n",
      "4  0acbeaa3   45a51db4     f95efa07     ...      20596  320  50  2161    0   \n",
      "\n",
      "    C19     C20  C21  hour_rate  avg(click)  \n",
      "0    35      -1  157   0.174714    0.169331  \n",
      "1  1835  100156   61   0.174714    0.169331  \n",
      "2    35      -1   79   0.174714    0.169331  \n",
      "3    35  100084   79   0.174714    0.169331  \n",
      "4    35  100148  157   0.174714    0.169331  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click    C1  banner_pos   site_id site_domain site_category  \\\n",
      "0  1.005585e+19      0  1005           0  85f751fd    c4e18dd6      50e219e0   \n",
      "1  1.010206e+19      0  1005           0  85f751fd    c4e18dd6      50e219e0   \n",
      "2  1.020168e+19      0  1005           0  1fbe01fe    f3845767      28905ebd   \n",
      "3  1.025000e+19      1  1005           0  1fbe01fe    f3845767      28905ebd   \n",
      "4  1.052369e+19      0  1005           0  85f751fd    c4e18dd6      50e219e0   \n",
      "\n",
      "     app_id app_domain app_category    ...       C14  C15 C16   C17  C18  \\\n",
      "0  0acbeaa3   45a51db4     f95efa07    ...     20596  320  50  2161    0   \n",
      "1  51cedd4e   aefc06bd     0f2161f8    ...     18648  320  50  1092    3   \n",
      "2  ecad2386   7801e8d9     07d7df22    ...     15703  320  50  1722    0   \n",
      "3  ecad2386   7801e8d9     07d7df22    ...     15702  320  50  1722    0   \n",
      "4  0acbeaa3   45a51db4     f95efa07    ...     20596  320  50  2161    0   \n",
      "\n",
      "    C19     C20  C21  hour_rate   C1_rate  \n",
      "0    35      -1  157   0.174714  0.169331  \n",
      "1  1835  100156   61   0.174714  0.169331  \n",
      "2    35      -1   79   0.174714  0.169331  \n",
      "3    35  100084   79   0.174714  0.169331  \n",
      "4    35  100148  157   0.174714  0.169331  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click  banner_pos   site_id site_domain site_category  \\\n",
      "0  1.005585e+19      0           0  85f751fd    c4e18dd6      50e219e0   \n",
      "1  1.010206e+19      0           0  85f751fd    c4e18dd6      50e219e0   \n",
      "2  1.020168e+19      0           0  1fbe01fe    f3845767      28905ebd   \n",
      "3  1.025000e+19      1           0  1fbe01fe    f3845767      28905ebd   \n",
      "4  1.052369e+19      0           0  85f751fd    c4e18dd6      50e219e0   \n",
      "\n",
      "     app_id app_domain app_category device_id    ...       C14  C15  C16  \\\n",
      "0  0acbeaa3   45a51db4     f95efa07  a99f214a    ...     20596  320   50   \n",
      "1  51cedd4e   aefc06bd     0f2161f8  a99f214a    ...     18648  320   50   \n",
      "2  ecad2386   7801e8d9     07d7df22  a99f214a    ...     15703  320   50   \n",
      "3  ecad2386   7801e8d9     07d7df22  a99f214a    ...     15702  320   50   \n",
      "4  0acbeaa3   45a51db4     f95efa07  a99f214a    ...     20596  320   50   \n",
      "\n",
      "    C17  C18   C19     C20  C21  hour_rate   C1_rate  \n",
      "0  2161    0    35      -1  157   0.174714  0.169331  \n",
      "1  1092    3  1835  100156   61   0.174714  0.169331  \n",
      "2  1722    0    35      -1   79   0.174714  0.169331  \n",
      "3  1722    0    35  100084   79   0.174714  0.169331  \n",
      "4  2161    0    35  100148  157   0.174714  0.169331  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "   banner_pos  avg(click)\n",
      "0           0    0.164272\n",
      "1           1    0.183614\n",
      "2           2    0.119222\n",
      "3           3    0.182801\n",
      "4           4    0.185358\n",
      "             id  click  banner_pos   site_id site_domain site_category  \\\n",
      "0  1.005585e+19      0           0  85f751fd    c4e18dd6      50e219e0   \n",
      "1  1.010206e+19      0           0  85f751fd    c4e18dd6      50e219e0   \n",
      "2  1.020168e+19      0           0  1fbe01fe    f3845767      28905ebd   \n",
      "3  1.025000e+19      1           0  1fbe01fe    f3845767      28905ebd   \n",
      "4  1.052369e+19      0           0  85f751fd    c4e18dd6      50e219e0   \n",
      "\n",
      "     app_id app_domain app_category device_id     ...      C15 C16   C17  C18  \\\n",
      "0  0acbeaa3   45a51db4     f95efa07  a99f214a     ...      320  50  2161    0   \n",
      "1  51cedd4e   aefc06bd     0f2161f8  a99f214a     ...      320  50  1092    3   \n",
      "2  ecad2386   7801e8d9     07d7df22  a99f214a     ...      320  50  1722    0   \n",
      "3  ecad2386   7801e8d9     07d7df22  a99f214a     ...      320  50  1722    0   \n",
      "4  0acbeaa3   45a51db4     f95efa07  a99f214a     ...      320  50  2161    0   \n",
      "\n",
      "    C19     C20  C21  hour_rate   C1_rate  avg(click)  \n",
      "0    35      -1  157   0.174714  0.169331    0.164272  \n",
      "1  1835  100156   61   0.174714  0.169331    0.164272  \n",
      "2    35      -1   79   0.174714  0.169331    0.164272  \n",
      "3    35  100084   79   0.174714  0.169331    0.164272  \n",
      "4    35  100148  157   0.174714  0.169331    0.164272  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click  banner_pos   site_id site_domain site_category  \\\n",
      "0  1.005585e+19      0           0  85f751fd    c4e18dd6      50e219e0   \n",
      "1  1.010206e+19      0           0  85f751fd    c4e18dd6      50e219e0   \n",
      "2  1.020168e+19      0           0  1fbe01fe    f3845767      28905ebd   \n",
      "3  1.025000e+19      1           0  1fbe01fe    f3845767      28905ebd   \n",
      "4  1.052369e+19      0           0  85f751fd    c4e18dd6      50e219e0   \n",
      "\n",
      "     app_id app_domain app_category device_id       ...         C15 C16   C17  \\\n",
      "0  0acbeaa3   45a51db4     f95efa07  a99f214a       ...         320  50  2161   \n",
      "1  51cedd4e   aefc06bd     0f2161f8  a99f214a       ...         320  50  1092   \n",
      "2  ecad2386   7801e8d9     07d7df22  a99f214a       ...         320  50  1722   \n",
      "3  ecad2386   7801e8d9     07d7df22  a99f214a       ...         320  50  1722   \n",
      "4  0acbeaa3   45a51db4     f95efa07  a99f214a       ...         320  50  2161   \n",
      "\n",
      "   C18   C19     C20  C21  hour_rate   C1_rate  banner_pos_rate  \n",
      "0    0    35      -1  157   0.174714  0.169331         0.164272  \n",
      "1    3  1835  100156   61   0.174714  0.169331         0.164272  \n",
      "2    0    35      -1   79   0.174714  0.169331         0.164272  \n",
      "3    0    35  100084   79   0.174714  0.169331         0.164272  \n",
      "4    0    35  100148  157   0.174714  0.169331         0.164272  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click   site_id site_domain site_category    app_id  \\\n",
      "0  1.005585e+19      0  85f751fd    c4e18dd6      50e219e0  0acbeaa3   \n",
      "1  1.010206e+19      0  85f751fd    c4e18dd6      50e219e0  51cedd4e   \n",
      "2  1.020168e+19      0  1fbe01fe    f3845767      28905ebd  ecad2386   \n",
      "3  1.025000e+19      1  1fbe01fe    f3845767      28905ebd  ecad2386   \n",
      "4  1.052369e+19      0  85f751fd    c4e18dd6      50e219e0  0acbeaa3   \n",
      "\n",
      "  app_domain app_category device_id device_ip       ...         C15  C16  \\\n",
      "0   45a51db4     f95efa07  a99f214a  7db30ee7       ...         320   50   \n",
      "1   aefc06bd     0f2161f8  a99f214a  50d510f0       ...         320   50   \n",
      "2   7801e8d9     07d7df22  a99f214a  39128688       ...         320   50   \n",
      "3   7801e8d9     07d7df22  a99f214a  da139835       ...         320   50   \n",
      "4   45a51db4     f95efa07  a99f214a  bbeeb866       ...         320   50   \n",
      "\n",
      "    C17  C18   C19     C20  C21  hour_rate   C1_rate  banner_pos_rate  \n",
      "0  2161    0    35      -1  157   0.174714  0.169331         0.164272  \n",
      "1  1092    3  1835  100156   61   0.174714  0.169331         0.164272  \n",
      "2  1722    0    35      -1   79   0.174714  0.169331         0.164272  \n",
      "3  1722    0    35  100084   79   0.174714  0.169331         0.164272  \n",
      "4  2161    0    35  100148  157   0.174714  0.169331         0.164272  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "    site_id  avg(click)\n",
      "0  000aa1a4    1.000000\n",
      "1  00255fb4    0.068558\n",
      "2  003cf93d    0.152778\n",
      "3  00476056    0.142857\n",
      "4  00564467    0.000000\n",
      "             id  click   site_id site_domain site_category    app_id  \\\n",
      "0  1.005585e+19      0  85f751fd    c4e18dd6      50e219e0  0acbeaa3   \n",
      "1  1.010206e+19      0  85f751fd    c4e18dd6      50e219e0  51cedd4e   \n",
      "2  1.052369e+19      0  85f751fd    c4e18dd6      50e219e0  0acbeaa3   \n",
      "3  1.066172e+19      0  85f751fd    c4e18dd6      50e219e0  5adb10d9   \n",
      "4  1.113486e+19      0  85f751fd    c4e18dd6      50e219e0  d33d55c4   \n",
      "\n",
      "  app_domain app_category device_id device_ip     ...     C16   C17  C18  \\\n",
      "0   45a51db4     f95efa07  a99f214a  7db30ee7     ...      50  2161    0   \n",
      "1   aefc06bd     0f2161f8  a99f214a  50d510f0     ...      50  1092    3   \n",
      "2   45a51db4     f95efa07  a99f214a  bbeeb866     ...      50  2161    0   \n",
      "3   2347f47a     cef3e649  140e83db  f9c89d31     ...      50  2161    0   \n",
      "4   d9b5648e     0f2161f8  a99f214a  8e6260e5     ...      50  2434    3   \n",
      "\n",
      "    C19     C20  C21  hour_rate   C1_rate  banner_pos_rate  avg(click)  \n",
      "0    35      -1  157   0.174714  0.169331         0.164272    0.118826  \n",
      "1  1835  100156   61   0.174714  0.169331         0.164272    0.118826  \n",
      "2    35  100148  157   0.174714  0.169331         0.164272    0.118826  \n",
      "3    35      -1  157   0.174714  0.169331         0.164272    0.118826  \n",
      "4   163  100088   61   0.174714  0.169331         0.164272    0.118826  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click   site_id site_domain site_category    app_id  \\\n",
      "0  1.005585e+19      0  85f751fd    c4e18dd6      50e219e0  0acbeaa3   \n",
      "1  1.010206e+19      0  85f751fd    c4e18dd6      50e219e0  51cedd4e   \n",
      "2  1.052369e+19      0  85f751fd    c4e18dd6      50e219e0  0acbeaa3   \n",
      "3  1.066172e+19      0  85f751fd    c4e18dd6      50e219e0  5adb10d9   \n",
      "4  1.113486e+19      0  85f751fd    c4e18dd6      50e219e0  d33d55c4   \n",
      "\n",
      "  app_domain app_category device_id device_ip      ...      C16   C17  C18  \\\n",
      "0   45a51db4     f95efa07  a99f214a  7db30ee7      ...       50  2161    0   \n",
      "1   aefc06bd     0f2161f8  a99f214a  50d510f0      ...       50  1092    3   \n",
      "2   45a51db4     f95efa07  a99f214a  bbeeb866      ...       50  2161    0   \n",
      "3   2347f47a     cef3e649  140e83db  f9c89d31      ...       50  2161    0   \n",
      "4   d9b5648e     0f2161f8  a99f214a  8e6260e5      ...       50  2434    3   \n",
      "\n",
      "    C19     C20  C21  hour_rate   C1_rate  banner_pos_rate  site_id_rate  \n",
      "0    35      -1  157   0.174714  0.169331         0.164272      0.118826  \n",
      "1  1835  100156   61   0.174714  0.169331         0.164272      0.118826  \n",
      "2    35  100148  157   0.174714  0.169331         0.164272      0.118826  \n",
      "3    35      -1  157   0.174714  0.169331         0.164272      0.118826  \n",
      "4   163  100088   61   0.174714  0.169331         0.164272      0.118826  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click site_domain site_category    app_id app_domain  \\\n",
      "0  1.005585e+19      0    c4e18dd6      50e219e0  0acbeaa3   45a51db4   \n",
      "1  1.010206e+19      0    c4e18dd6      50e219e0  51cedd4e   aefc06bd   \n",
      "2  1.052369e+19      0    c4e18dd6      50e219e0  0acbeaa3   45a51db4   \n",
      "3  1.066172e+19      0    c4e18dd6      50e219e0  5adb10d9   2347f47a   \n",
      "4  1.113486e+19      0    c4e18dd6      50e219e0  d33d55c4   d9b5648e   \n",
      "\n",
      "  app_category device_id device_ip device_model      ...       C16   C17  C18  \\\n",
      "0     f95efa07  a99f214a  7db30ee7     9f8d0424      ...        50  2161    0   \n",
      "1     0f2161f8  a99f214a  50d510f0     bbeedfee      ...        50  1092    3   \n",
      "2     f95efa07  a99f214a  bbeeb866     fce66524      ...        50  2161    0   \n",
      "3     cef3e649  140e83db  f9c89d31     dc356277      ...        50  2161    0   \n",
      "4     0f2161f8  a99f214a  8e6260e5     d056b4bf      ...        50  2434    3   \n",
      "\n",
      "    C19     C20  C21  hour_rate   C1_rate  banner_pos_rate  site_id_rate  \n",
      "0    35      -1  157   0.174714  0.169331         0.164272      0.118826  \n",
      "1  1835  100156   61   0.174714  0.169331         0.164272      0.118826  \n",
      "2    35  100148  157   0.174714  0.169331         0.164272      0.118826  \n",
      "3    35      -1  157   0.174714  0.169331         0.164272      0.118826  \n",
      "4   163  100088   61   0.174714  0.169331         0.164272      0.118826  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "  site_domain  avg(click)\n",
      "0    000129ff    0.000000\n",
      "1    0035f25a    0.000000\n",
      "2    004d30ed    0.000000\n",
      "3    005b4641    1.000000\n",
      "4    005b495a    0.359194\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "             id  click site_domain site_category    app_id app_domain  \\\n",
      "0  1.005585e+19      0    c4e18dd6      50e219e0  0acbeaa3   45a51db4   \n",
      "1  1.010206e+19      0    c4e18dd6      50e219e0  51cedd4e   aefc06bd   \n",
      "2  1.052369e+19      0    c4e18dd6      50e219e0  0acbeaa3   45a51db4   \n",
      "3  1.066172e+19      0    c4e18dd6      50e219e0  5adb10d9   2347f47a   \n",
      "4  1.113486e+19      0    c4e18dd6      50e219e0  d33d55c4   d9b5648e   \n",
      "\n",
      "  app_category device_id device_ip device_model     ...       C17  C18   C19  \\\n",
      "0     f95efa07  a99f214a  7db30ee7     9f8d0424     ...      2161    0    35   \n",
      "1     0f2161f8  a99f214a  50d510f0     bbeedfee     ...      1092    3  1835   \n",
      "2     f95efa07  a99f214a  bbeeb866     fce66524     ...      2161    0    35   \n",
      "3     cef3e649  140e83db  f9c89d31     dc356277     ...      2161    0    35   \n",
      "4     0f2161f8  a99f214a  8e6260e5     d056b4bf     ...      2434    3   163   \n",
      "\n",
      "      C20  C21  hour_rate   C1_rate  banner_pos_rate  site_id_rate  avg(click)  \n",
      "0      -1  157   0.174714  0.169331         0.164272      0.118826     0.12275  \n",
      "1  100156   61   0.174714  0.169331         0.164272      0.118826     0.12275  \n",
      "2  100148  157   0.174714  0.169331         0.164272      0.118826     0.12275  \n",
      "3      -1  157   0.174714  0.169331         0.164272      0.118826     0.12275  \n",
      "4  100088   61   0.174714  0.169331         0.164272      0.118826     0.12275  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click site_domain site_category    app_id app_domain  \\\n",
      "0  1.005585e+19      0    c4e18dd6      50e219e0  0acbeaa3   45a51db4   \n",
      "1  1.010206e+19      0    c4e18dd6      50e219e0  51cedd4e   aefc06bd   \n",
      "2  1.052369e+19      0    c4e18dd6      50e219e0  0acbeaa3   45a51db4   \n",
      "3  1.066172e+19      0    c4e18dd6      50e219e0  5adb10d9   2347f47a   \n",
      "4  1.113486e+19      0    c4e18dd6      50e219e0  d33d55c4   d9b5648e   \n",
      "\n",
      "  app_category device_id device_ip device_model        ...          C17  C18  \\\n",
      "0     f95efa07  a99f214a  7db30ee7     9f8d0424        ...         2161    0   \n",
      "1     0f2161f8  a99f214a  50d510f0     bbeedfee        ...         1092    3   \n",
      "2     f95efa07  a99f214a  bbeeb866     fce66524        ...         2161    0   \n",
      "3     cef3e649  140e83db  f9c89d31     dc356277        ...         2161    0   \n",
      "4     0f2161f8  a99f214a  8e6260e5     d056b4bf        ...         2434    3   \n",
      "\n",
      "    C19     C20  C21  hour_rate   C1_rate  banner_pos_rate  site_id_rate  \\\n",
      "0    35      -1  157   0.174714  0.169331         0.164272      0.118826   \n",
      "1  1835  100156   61   0.174714  0.169331         0.164272      0.118826   \n",
      "2    35  100148  157   0.174714  0.169331         0.164272      0.118826   \n",
      "3    35      -1  157   0.174714  0.169331         0.164272      0.118826   \n",
      "4   163  100088   61   0.174714  0.169331         0.164272      0.118826   \n",
      "\n",
      "   site_domain_rate  \n",
      "0           0.12275  \n",
      "1           0.12275  \n",
      "2           0.12275  \n",
      "3           0.12275  \n",
      "4           0.12275  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click site_category    app_id app_domain app_category  \\\n",
      "0  1.005585e+19      0      50e219e0  0acbeaa3   45a51db4     f95efa07   \n",
      "1  1.010206e+19      0      50e219e0  51cedd4e   aefc06bd     0f2161f8   \n",
      "2  1.052369e+19      0      50e219e0  0acbeaa3   45a51db4     f95efa07   \n",
      "3  1.066172e+19      0      50e219e0  5adb10d9   2347f47a     cef3e649   \n",
      "4  1.113486e+19      0      50e219e0  d33d55c4   d9b5648e     0f2161f8   \n",
      "\n",
      "  device_id device_ip device_model  device_type        ...          C17  C18  \\\n",
      "0  a99f214a  7db30ee7     9f8d0424            1        ...         2161    0   \n",
      "1  a99f214a  50d510f0     bbeedfee            1        ...         1092    3   \n",
      "2  a99f214a  bbeeb866     fce66524            1        ...         2161    0   \n",
      "3  140e83db  f9c89d31     dc356277            1        ...         2161    0   \n",
      "4  a99f214a  8e6260e5     d056b4bf            1        ...         2434    3   \n",
      "\n",
      "    C19     C20  C21  hour_rate   C1_rate  banner_pos_rate  site_id_rate  \\\n",
      "0    35      -1  157   0.174714  0.169331         0.164272      0.118826   \n",
      "1  1835  100156   61   0.174714  0.169331         0.164272      0.118826   \n",
      "2    35  100148  157   0.174714  0.169331         0.164272      0.118826   \n",
      "3    35      -1  157   0.174714  0.169331         0.164272      0.118826   \n",
      "4   163  100088   61   0.174714  0.169331         0.164272      0.118826   \n",
      "\n",
      "   site_domain_rate  \n",
      "0           0.12275  \n",
      "1           0.12275  \n",
      "2           0.12275  \n",
      "3           0.12275  \n",
      "4           0.12275  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "  site_category  avg(click)\n",
      "0      0569f928    0.053958\n",
      "1      110ab22d    0.000000\n",
      "2      28905ebd    0.208019\n",
      "3      335d28a8    0.093644\n",
      "4      3e814130    0.283003\n",
      "             id  click site_category    app_id app_domain app_category  \\\n",
      "0  1.005585e+19      0      50e219e0  0acbeaa3   45a51db4     f95efa07   \n",
      "1  1.010206e+19      0      50e219e0  51cedd4e   aefc06bd     0f2161f8   \n",
      "2  1.052369e+19      0      50e219e0  0acbeaa3   45a51db4     f95efa07   \n",
      "3  1.066172e+19      0      50e219e0  5adb10d9   2347f47a     cef3e649   \n",
      "4  1.113486e+19      0      50e219e0  d33d55c4   d9b5648e     0f2161f8   \n",
      "\n",
      "  device_id device_ip device_model  device_type     ...      C18   C19  \\\n",
      "0  a99f214a  7db30ee7     9f8d0424            1     ...        0    35   \n",
      "1  a99f214a  50d510f0     bbeedfee            1     ...        3  1835   \n",
      "2  a99f214a  bbeeb866     fce66524            1     ...        0    35   \n",
      "3  140e83db  f9c89d31     dc356277            1     ...        0    35   \n",
      "4  a99f214a  8e6260e5     d056b4bf            1     ...        3   163   \n",
      "\n",
      "      C20  C21  hour_rate   C1_rate  banner_pos_rate  site_id_rate  \\\n",
      "0      -1  157   0.174714  0.169331         0.164272      0.118826   \n",
      "1  100156   61   0.174714  0.169331         0.164272      0.118826   \n",
      "2  100148  157   0.174714  0.169331         0.164272      0.118826   \n",
      "3      -1  157   0.174714  0.169331         0.164272      0.118826   \n",
      "4  100088   61   0.174714  0.169331         0.164272      0.118826   \n",
      "\n",
      "   site_domain_rate  avg(click)  \n",
      "0           0.12275     0.12858  \n",
      "1           0.12275     0.12858  \n",
      "2           0.12275     0.12858  \n",
      "3           0.12275     0.12858  \n",
      "4           0.12275     0.12858  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click site_category    app_id app_domain app_category  \\\n",
      "0  1.005585e+19      0      50e219e0  0acbeaa3   45a51db4     f95efa07   \n",
      "1  1.010206e+19      0      50e219e0  51cedd4e   aefc06bd     0f2161f8   \n",
      "2  1.052369e+19      0      50e219e0  0acbeaa3   45a51db4     f95efa07   \n",
      "3  1.066172e+19      0      50e219e0  5adb10d9   2347f47a     cef3e649   \n",
      "4  1.113486e+19      0      50e219e0  d33d55c4   d9b5648e     0f2161f8   \n",
      "\n",
      "  device_id device_ip device_model  device_type         ...          C18  \\\n",
      "0  a99f214a  7db30ee7     9f8d0424            1         ...            0   \n",
      "1  a99f214a  50d510f0     bbeedfee            1         ...            3   \n",
      "2  a99f214a  bbeeb866     fce66524            1         ...            0   \n",
      "3  140e83db  f9c89d31     dc356277            1         ...            0   \n",
      "4  a99f214a  8e6260e5     d056b4bf            1         ...            3   \n",
      "\n",
      "    C19     C20  C21  hour_rate   C1_rate  banner_pos_rate  site_id_rate  \\\n",
      "0    35      -1  157   0.174714  0.169331         0.164272      0.118826   \n",
      "1  1835  100156   61   0.174714  0.169331         0.164272      0.118826   \n",
      "2    35  100148  157   0.174714  0.169331         0.164272      0.118826   \n",
      "3    35      -1  157   0.174714  0.169331         0.164272      0.118826   \n",
      "4   163  100088   61   0.174714  0.169331         0.164272      0.118826   \n",
      "\n",
      "   site_domain_rate  site_category_rate  \n",
      "0           0.12275             0.12858  \n",
      "1           0.12275             0.12858  \n",
      "2           0.12275             0.12858  \n",
      "3           0.12275             0.12858  \n",
      "4           0.12275             0.12858  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click    app_id app_domain app_category device_id device_ip  \\\n",
      "0  1.005585e+19      0  0acbeaa3   45a51db4     f95efa07  a99f214a  7db30ee7   \n",
      "1  1.010206e+19      0  51cedd4e   aefc06bd     0f2161f8  a99f214a  50d510f0   \n",
      "2  1.052369e+19      0  0acbeaa3   45a51db4     f95efa07  a99f214a  bbeeb866   \n",
      "3  1.066172e+19      0  5adb10d9   2347f47a     cef3e649  140e83db  f9c89d31   \n",
      "4  1.113486e+19      0  d33d55c4   d9b5648e     0f2161f8  a99f214a  8e6260e5   \n",
      "\n",
      "  device_model  device_type  device_conn_type         ...          C18   C19  \\\n",
      "0     9f8d0424            1                 0         ...            0    35   \n",
      "1     bbeedfee            1                 0         ...            3  1835   \n",
      "2     fce66524            1                 0         ...            0    35   \n",
      "3     dc356277            1                 2         ...            0    35   \n",
      "4     d056b4bf            1                 0         ...            3   163   \n",
      "\n",
      "      C20  C21  hour_rate   C1_rate  banner_pos_rate  site_id_rate  \\\n",
      "0      -1  157   0.174714  0.169331         0.164272      0.118826   \n",
      "1  100156   61   0.174714  0.169331         0.164272      0.118826   \n",
      "2  100148  157   0.174714  0.169331         0.164272      0.118826   \n",
      "3      -1  157   0.174714  0.169331         0.164272      0.118826   \n",
      "4  100088   61   0.174714  0.169331         0.164272      0.118826   \n",
      "\n",
      "   site_domain_rate  site_category_rate  \n",
      "0           0.12275             0.12858  \n",
      "1           0.12275             0.12858  \n",
      "2           0.12275             0.12858  \n",
      "3           0.12275             0.12858  \n",
      "4           0.12275             0.12858  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "     app_id  avg(click)\n",
      "0  000d6291    0.021978\n",
      "1  000f21f1    0.000000\n",
      "2  00110ae2    0.111111\n",
      "3  00119fc5    0.000000\n",
      "4  0014fe4d    0.000000\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "             id  click    app_id app_domain app_category device_id device_ip  \\\n",
      "0  1.005585e+19      0  0acbeaa3   45a51db4     f95efa07  a99f214a  7db30ee7   \n",
      "1  1.052369e+19      0  0acbeaa3   45a51db4     f95efa07  a99f214a  bbeeb866   \n",
      "2  2.566852e+18      0  0acbeaa3   45a51db4     f95efa07  a99f214a  74e435bd   \n",
      "3  1.391783e+19      0  0acbeaa3   45a51db4     f95efa07  a99f214a  1ecc23fa   \n",
      "4  1.631158e+19      0  0acbeaa3   45a51db4     f95efa07  a99f214a  1e94c42f   \n",
      "\n",
      "  device_model  device_type  device_conn_type     ...      C19     C20  C21  \\\n",
      "0     9f8d0424            1                 0     ...       35      -1  157   \n",
      "1     fce66524            1                 0     ...       35  100148  157   \n",
      "2     5e12edef            1                 0     ...       35  100034  157   \n",
      "3     7fe9fa2c            1                 0     ...       35  100034  157   \n",
      "4     405444ca            1                 0     ...      547      -1   51   \n",
      "\n",
      "   hour_rate   C1_rate  banner_pos_rate  site_id_rate  site_domain_rate  \\\n",
      "0   0.174714  0.169331         0.164272      0.118826           0.12275   \n",
      "1   0.174714  0.169331         0.164272      0.118826           0.12275   \n",
      "2   0.174714  0.169331         0.164272      0.118826           0.12275   \n",
      "3   0.173695  0.169331         0.164272      0.118826           0.12275   \n",
      "4   0.173695  0.169331         0.164272      0.118826           0.12275   \n",
      "\n",
      "   site_category_rate  avg(click)  \n",
      "0             0.12858    0.201129  \n",
      "1             0.12858    0.201129  \n",
      "2             0.12858    0.201129  \n",
      "3             0.12858    0.201129  \n",
      "4             0.12858    0.201129  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click    app_id app_domain app_category device_id device_ip  \\\n",
      "0  1.005585e+19      0  0acbeaa3   45a51db4     f95efa07  a99f214a  7db30ee7   \n",
      "1  1.052369e+19      0  0acbeaa3   45a51db4     f95efa07  a99f214a  bbeeb866   \n",
      "2  2.566852e+18      0  0acbeaa3   45a51db4     f95efa07  a99f214a  74e435bd   \n",
      "3  1.391783e+19      0  0acbeaa3   45a51db4     f95efa07  a99f214a  1ecc23fa   \n",
      "4  1.631158e+19      0  0acbeaa3   45a51db4     f95efa07  a99f214a  1e94c42f   \n",
      "\n",
      "  device_model  device_type  device_conn_type     ...       C19     C20  C21  \\\n",
      "0     9f8d0424            1                 0     ...        35      -1  157   \n",
      "1     fce66524            1                 0     ...        35  100148  157   \n",
      "2     5e12edef            1                 0     ...        35  100034  157   \n",
      "3     7fe9fa2c            1                 0     ...        35  100034  157   \n",
      "4     405444ca            1                 0     ...       547      -1   51   \n",
      "\n",
      "   hour_rate   C1_rate  banner_pos_rate  site_id_rate  site_domain_rate  \\\n",
      "0   0.174714  0.169331         0.164272      0.118826           0.12275   \n",
      "1   0.174714  0.169331         0.164272      0.118826           0.12275   \n",
      "2   0.174714  0.169331         0.164272      0.118826           0.12275   \n",
      "3   0.173695  0.169331         0.164272      0.118826           0.12275   \n",
      "4   0.173695  0.169331         0.164272      0.118826           0.12275   \n",
      "\n",
      "   site_category_rate  app_id_rate  \n",
      "0             0.12858     0.201129  \n",
      "1             0.12858     0.201129  \n",
      "2             0.12858     0.201129  \n",
      "3             0.12858     0.201129  \n",
      "4             0.12858     0.201129  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click app_domain app_category device_id device_ip  \\\n",
      "0  1.005585e+19      0   45a51db4     f95efa07  a99f214a  7db30ee7   \n",
      "1  1.052369e+19      0   45a51db4     f95efa07  a99f214a  bbeeb866   \n",
      "2  2.566852e+18      0   45a51db4     f95efa07  a99f214a  74e435bd   \n",
      "3  1.391783e+19      0   45a51db4     f95efa07  a99f214a  1ecc23fa   \n",
      "4  1.631158e+19      0   45a51db4     f95efa07  a99f214a  1e94c42f   \n",
      "\n",
      "  device_model  device_type  device_conn_type    C14     ...       C19  \\\n",
      "0     9f8d0424            1                 0  20596     ...        35   \n",
      "1     fce66524            1                 0  20596     ...        35   \n",
      "2     5e12edef            1                 0  18993     ...        35   \n",
      "3     7fe9fa2c            1                 0  20596     ...        35   \n",
      "4     405444ca            1                 0  21647     ...       547   \n",
      "\n",
      "      C20  C21  hour_rate   C1_rate  banner_pos_rate  site_id_rate  \\\n",
      "0      -1  157   0.174714  0.169331         0.164272      0.118826   \n",
      "1  100148  157   0.174714  0.169331         0.164272      0.118826   \n",
      "2  100034  157   0.174714  0.169331         0.164272      0.118826   \n",
      "3  100034  157   0.173695  0.169331         0.164272      0.118826   \n",
      "4      -1   51   0.173695  0.169331         0.164272      0.118826   \n",
      "\n",
      "   site_domain_rate  site_category_rate  app_id_rate  \n",
      "0           0.12275             0.12858     0.201129  \n",
      "1           0.12275             0.12858     0.201129  \n",
      "2           0.12275             0.12858     0.201129  \n",
      "3           0.12275             0.12858     0.201129  \n",
      "4           0.12275             0.12858     0.201129  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "  app_domain  avg(click)\n",
      "0   001b87ae         0.0\n",
      "1   002e4064         0.0\n",
      "2   00314725         0.0\n",
      "3   030e4250         0.0\n",
      "4   03da86e1         0.0\n",
      "             id  click app_domain app_category device_id device_ip  \\\n",
      "0  1.005585e+19      0   45a51db4     f95efa07  a99f214a  7db30ee7   \n",
      "1  1.052369e+19      0   45a51db4     f95efa07  a99f214a  bbeeb866   \n",
      "2  2.566852e+18      0   45a51db4     f95efa07  a99f214a  74e435bd   \n",
      "3  1.391783e+19      0   45a51db4     f95efa07  a99f214a  1ecc23fa   \n",
      "4  1.631158e+19      0   45a51db4     f95efa07  a99f214a  1e94c42f   \n",
      "\n",
      "  device_model  device_type  device_conn_type    C14     ...         C20  C21  \\\n",
      "0     9f8d0424            1                 0  20596     ...          -1  157   \n",
      "1     fce66524            1                 0  20596     ...      100148  157   \n",
      "2     5e12edef            1                 0  18993     ...      100034  157   \n",
      "3     7fe9fa2c            1                 0  20596     ...      100034  157   \n",
      "4     405444ca            1                 0  21647     ...          -1   51   \n",
      "\n",
      "   hour_rate   C1_rate  banner_pos_rate  site_id_rate  site_domain_rate  \\\n",
      "0   0.174714  0.169331         0.164272      0.118826           0.12275   \n",
      "1   0.174714  0.169331         0.164272      0.118826           0.12275   \n",
      "2   0.174714  0.169331         0.164272      0.118826           0.12275   \n",
      "3   0.173695  0.169331         0.164272      0.118826           0.12275   \n",
      "4   0.173695  0.169331         0.164272      0.118826           0.12275   \n",
      "\n",
      "   site_category_rate  app_id_rate  avg(click)  \n",
      "0             0.12858     0.201129    0.201129  \n",
      "1             0.12858     0.201129    0.201129  \n",
      "2             0.12858     0.201129    0.201129  \n",
      "3             0.12858     0.201129    0.201129  \n",
      "4             0.12858     0.201129    0.201129  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click app_domain app_category device_id device_ip  \\\n",
      "0  1.005585e+19      0   45a51db4     f95efa07  a99f214a  7db30ee7   \n",
      "1  1.052369e+19      0   45a51db4     f95efa07  a99f214a  bbeeb866   \n",
      "2  2.566852e+18      0   45a51db4     f95efa07  a99f214a  74e435bd   \n",
      "3  1.391783e+19      0   45a51db4     f95efa07  a99f214a  1ecc23fa   \n",
      "4  1.631158e+19      0   45a51db4     f95efa07  a99f214a  1e94c42f   \n",
      "\n",
      "  device_model  device_type  device_conn_type    C14       ...            C20  \\\n",
      "0     9f8d0424            1                 0  20596       ...             -1   \n",
      "1     fce66524            1                 0  20596       ...         100148   \n",
      "2     5e12edef            1                 0  18993       ...         100034   \n",
      "3     7fe9fa2c            1                 0  20596       ...         100034   \n",
      "4     405444ca            1                 0  21647       ...             -1   \n",
      "\n",
      "   C21  hour_rate   C1_rate  banner_pos_rate  site_id_rate  site_domain_rate  \\\n",
      "0  157   0.174714  0.169331         0.164272      0.118826           0.12275   \n",
      "1  157   0.174714  0.169331         0.164272      0.118826           0.12275   \n",
      "2  157   0.174714  0.169331         0.164272      0.118826           0.12275   \n",
      "3  157   0.173695  0.169331         0.164272      0.118826           0.12275   \n",
      "4   51   0.173695  0.169331         0.164272      0.118826           0.12275   \n",
      "\n",
      "   site_category_rate  app_id_rate  app_domain_rate  \n",
      "0             0.12858     0.201129         0.201129  \n",
      "1             0.12858     0.201129         0.201129  \n",
      "2             0.12858     0.201129         0.201129  \n",
      "3             0.12858     0.201129         0.201129  \n",
      "4             0.12858     0.201129         0.201129  \n",
      "\n",
      "[5 rows x 25 columns]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "             id  click app_category device_id device_ip device_model  \\\n",
      "0  1.005585e+19      0     f95efa07  a99f214a  7db30ee7     9f8d0424   \n",
      "1  1.052369e+19      0     f95efa07  a99f214a  bbeeb866     fce66524   \n",
      "2  2.566852e+18      0     f95efa07  a99f214a  74e435bd     5e12edef   \n",
      "3  1.391783e+19      0     f95efa07  a99f214a  1ecc23fa     7fe9fa2c   \n",
      "4  1.631158e+19      0     f95efa07  a99f214a  1e94c42f     405444ca   \n",
      "\n",
      "   device_type  device_conn_type    C14  C15       ...            C20  C21  \\\n",
      "0            1                 0  20596  320       ...             -1  157   \n",
      "1            1                 0  20596  320       ...         100148  157   \n",
      "2            1                 0  18993  320       ...         100034  157   \n",
      "3            1                 0  20596  320       ...         100034  157   \n",
      "4            1                 0  21647  320       ...             -1   51   \n",
      "\n",
      "   hour_rate   C1_rate  banner_pos_rate  site_id_rate  site_domain_rate  \\\n",
      "0   0.174714  0.169331         0.164272      0.118826           0.12275   \n",
      "1   0.174714  0.169331         0.164272      0.118826           0.12275   \n",
      "2   0.174714  0.169331         0.164272      0.118826           0.12275   \n",
      "3   0.173695  0.169331         0.164272      0.118826           0.12275   \n",
      "4   0.173695  0.169331         0.164272      0.118826           0.12275   \n",
      "\n",
      "   site_category_rate  app_id_rate  app_domain_rate  \n",
      "0             0.12858     0.201129         0.201129  \n",
      "1             0.12858     0.201129         0.201129  \n",
      "2             0.12858     0.201129         0.201129  \n",
      "3             0.12858     0.201129         0.201129  \n",
      "4             0.12858     0.201129         0.201129  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "  app_category  avg(click)\n",
      "0     07d7df22    0.199148\n",
      "1     09481d60    0.155194\n",
      "2     0bfbc358    0.016471\n",
      "3     0d82db25    0.160000\n",
      "4     0f2161f8    0.108118\n",
      "             id  click app_category device_id device_ip device_model  \\\n",
      "0  1.005585e+19      0     f95efa07  a99f214a  7db30ee7     9f8d0424   \n",
      "1  1.052369e+19      0     f95efa07  a99f214a  bbeeb866     fce66524   \n",
      "2  2.566852e+18      0     f95efa07  a99f214a  74e435bd     5e12edef   \n",
      "3  1.391783e+19      0     f95efa07  a99f214a  1ecc23fa     7fe9fa2c   \n",
      "4  1.631158e+19      0     f95efa07  a99f214a  1e94c42f     405444ca   \n",
      "\n",
      "   device_type  device_conn_type    C14  C15     ...      C21  hour_rate  \\\n",
      "0            1                 0  20596  320     ...      157   0.174714   \n",
      "1            1                 0  20596  320     ...      157   0.174714   \n",
      "2            1                 0  18993  320     ...      157   0.174714   \n",
      "3            1                 0  20596  320     ...      157   0.173695   \n",
      "4            1                 0  21647  320     ...       51   0.173695   \n",
      "\n",
      "    C1_rate  banner_pos_rate  site_id_rate  site_domain_rate  \\\n",
      "0  0.169331         0.164272      0.118826           0.12275   \n",
      "1  0.169331         0.164272      0.118826           0.12275   \n",
      "2  0.169331         0.164272      0.118826           0.12275   \n",
      "3  0.169331         0.164272      0.118826           0.12275   \n",
      "4  0.169331         0.164272      0.118826           0.12275   \n",
      "\n",
      "   site_category_rate  app_id_rate  app_domain_rate  avg(click)  \n",
      "0             0.12858     0.201129         0.201129    0.247588  \n",
      "1             0.12858     0.201129         0.201129    0.247588  \n",
      "2             0.12858     0.201129         0.201129    0.247588  \n",
      "3             0.12858     0.201129         0.201129    0.247588  \n",
      "4             0.12858     0.201129         0.201129    0.247588  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click app_category device_id device_ip device_model  \\\n",
      "0  1.005585e+19      0     f95efa07  a99f214a  7db30ee7     9f8d0424   \n",
      "1  1.052369e+19      0     f95efa07  a99f214a  bbeeb866     fce66524   \n",
      "2  2.566852e+18      0     f95efa07  a99f214a  74e435bd     5e12edef   \n",
      "3  1.391783e+19      0     f95efa07  a99f214a  1ecc23fa     7fe9fa2c   \n",
      "4  1.631158e+19      0     f95efa07  a99f214a  1e94c42f     405444ca   \n",
      "\n",
      "   device_type  device_conn_type    C14  C15        ...          C21  \\\n",
      "0            1                 0  20596  320        ...          157   \n",
      "1            1                 0  20596  320        ...          157   \n",
      "2            1                 0  18993  320        ...          157   \n",
      "3            1                 0  20596  320        ...          157   \n",
      "4            1                 0  21647  320        ...           51   \n",
      "\n",
      "   hour_rate   C1_rate  banner_pos_rate  site_id_rate  site_domain_rate  \\\n",
      "0   0.174714  0.169331         0.164272      0.118826           0.12275   \n",
      "1   0.174714  0.169331         0.164272      0.118826           0.12275   \n",
      "2   0.174714  0.169331         0.164272      0.118826           0.12275   \n",
      "3   0.173695  0.169331         0.164272      0.118826           0.12275   \n",
      "4   0.173695  0.169331         0.164272      0.118826           0.12275   \n",
      "\n",
      "   site_category_rate  app_id_rate  app_domain_rate  app_category_rate  \n",
      "0             0.12858     0.201129         0.201129           0.247588  \n",
      "1             0.12858     0.201129         0.201129           0.247588  \n",
      "2             0.12858     0.201129         0.201129           0.247588  \n",
      "3             0.12858     0.201129         0.201129           0.247588  \n",
      "4             0.12858     0.201129         0.201129           0.247588  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click device_id device_ip device_model  device_type  \\\n",
      "0  1.005585e+19      0  a99f214a  7db30ee7     9f8d0424            1   \n",
      "1  1.052369e+19      0  a99f214a  bbeeb866     fce66524            1   \n",
      "2  2.566852e+18      0  a99f214a  74e435bd     5e12edef            1   \n",
      "3  1.391783e+19      0  a99f214a  1ecc23fa     7fe9fa2c            1   \n",
      "4  1.631158e+19      0  a99f214a  1e94c42f     405444ca            1   \n",
      "\n",
      "   device_conn_type    C14  C15  C16        ...          C21  hour_rate  \\\n",
      "0                 0  20596  320   50        ...          157   0.174714   \n",
      "1                 0  20596  320   50        ...          157   0.174714   \n",
      "2                 0  18993  320   50        ...          157   0.174714   \n",
      "3                 0  20596  320   50        ...          157   0.173695   \n",
      "4                 0  21647  320   50        ...           51   0.173695   \n",
      "\n",
      "    C1_rate  banner_pos_rate  site_id_rate  site_domain_rate  \\\n",
      "0  0.169331         0.164272      0.118826           0.12275   \n",
      "1  0.169331         0.164272      0.118826           0.12275   \n",
      "2  0.169331         0.164272      0.118826           0.12275   \n",
      "3  0.169331         0.164272      0.118826           0.12275   \n",
      "4  0.169331         0.164272      0.118826           0.12275   \n",
      "\n",
      "   site_category_rate  app_id_rate  app_domain_rate  app_category_rate  \n",
      "0             0.12858     0.201129         0.201129           0.247588  \n",
      "1             0.12858     0.201129         0.201129           0.247588  \n",
      "2             0.12858     0.201129         0.201129           0.247588  \n",
      "3             0.12858     0.201129         0.201129           0.247588  \n",
      "4             0.12858     0.201129         0.201129           0.247588  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "  device_id  avg(click)\n",
      "0  00000414         0.0\n",
      "1  00000715         0.0\n",
      "2  00000919         0.0\n",
      "3  00000b7c         0.0\n",
      "4  00001237         0.0\n",
      "             id  click device_id device_ip device_model  device_type  \\\n",
      "0  1.005585e+19      0  a99f214a  7db30ee7     9f8d0424            1   \n",
      "1  1.052369e+19      0  a99f214a  bbeeb866     fce66524            1   \n",
      "2  2.566852e+18      0  a99f214a  74e435bd     5e12edef            1   \n",
      "3  1.391783e+19      0  a99f214a  1ecc23fa     7fe9fa2c            1   \n",
      "4  1.631158e+19      0  a99f214a  1e94c42f     405444ca            1   \n",
      "\n",
      "   device_conn_type    C14  C15  C16     ...      hour_rate   C1_rate  \\\n",
      "0                 0  20596  320   50     ...       0.174714  0.169331   \n",
      "1                 0  20596  320   50     ...       0.174714  0.169331   \n",
      "2                 0  18993  320   50     ...       0.174714  0.169331   \n",
      "3                 0  20596  320   50     ...       0.173695  0.169331   \n",
      "4                 0  21647  320   50     ...       0.173695  0.169331   \n",
      "\n",
      "   banner_pos_rate  site_id_rate  site_domain_rate  site_category_rate  \\\n",
      "0         0.164272      0.118826           0.12275             0.12858   \n",
      "1         0.164272      0.118826           0.12275             0.12858   \n",
      "2         0.164272      0.118826           0.12275             0.12858   \n",
      "3         0.164272      0.118826           0.12275             0.12858   \n",
      "4         0.164272      0.118826           0.12275             0.12858   \n",
      "\n",
      "   app_id_rate  app_domain_rate  app_category_rate  avg(click)  \n",
      "0     0.201129         0.201129           0.247588    0.174152  \n",
      "1     0.201129         0.201129           0.247588    0.174152  \n",
      "2     0.201129         0.201129           0.247588    0.174152  \n",
      "3     0.201129         0.201129           0.247588    0.174152  \n",
      "4     0.201129         0.201129           0.247588    0.174152  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click device_id device_ip device_model  device_type  \\\n",
      "0  1.005585e+19      0  a99f214a  7db30ee7     9f8d0424            1   \n",
      "1  1.052369e+19      0  a99f214a  bbeeb866     fce66524            1   \n",
      "2  2.566852e+18      0  a99f214a  74e435bd     5e12edef            1   \n",
      "3  1.391783e+19      0  a99f214a  1ecc23fa     7fe9fa2c            1   \n",
      "4  1.631158e+19      0  a99f214a  1e94c42f     405444ca            1   \n",
      "\n",
      "   device_conn_type    C14  C15  C16       ...        hour_rate   C1_rate  \\\n",
      "0                 0  20596  320   50       ...         0.174714  0.169331   \n",
      "1                 0  20596  320   50       ...         0.174714  0.169331   \n",
      "2                 0  18993  320   50       ...         0.174714  0.169331   \n",
      "3                 0  20596  320   50       ...         0.173695  0.169331   \n",
      "4                 0  21647  320   50       ...         0.173695  0.169331   \n",
      "\n",
      "   banner_pos_rate  site_id_rate  site_domain_rate  site_category_rate  \\\n",
      "0         0.164272      0.118826           0.12275             0.12858   \n",
      "1         0.164272      0.118826           0.12275             0.12858   \n",
      "2         0.164272      0.118826           0.12275             0.12858   \n",
      "3         0.164272      0.118826           0.12275             0.12858   \n",
      "4         0.164272      0.118826           0.12275             0.12858   \n",
      "\n",
      "   app_id_rate  app_domain_rate  app_category_rate  device_id_rate  \n",
      "0     0.201129         0.201129           0.247588        0.174152  \n",
      "1     0.201129         0.201129           0.247588        0.174152  \n",
      "2     0.201129         0.201129           0.247588        0.174152  \n",
      "3     0.201129         0.201129           0.247588        0.174152  \n",
      "4     0.201129         0.201129           0.247588        0.174152  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click device_ip device_model  device_type  device_conn_type  \\\n",
      "0  1.005585e+19      0  7db30ee7     9f8d0424            1                 0   \n",
      "1  1.052369e+19      0  bbeeb866     fce66524            1                 0   \n",
      "2  2.566852e+18      0  74e435bd     5e12edef            1                 0   \n",
      "3  1.391783e+19      0  1ecc23fa     7fe9fa2c            1                 0   \n",
      "4  1.631158e+19      0  1e94c42f     405444ca            1                 0   \n",
      "\n",
      "     C14  C15  C16   C17       ...        hour_rate   C1_rate  \\\n",
      "0  20596  320   50  2161       ...         0.174714  0.169331   \n",
      "1  20596  320   50  2161       ...         0.174714  0.169331   \n",
      "2  18993  320   50  2161       ...         0.174714  0.169331   \n",
      "3  20596  320   50  2161       ...         0.173695  0.169331   \n",
      "4  21647  320   50  2487       ...         0.173695  0.169331   \n",
      "\n",
      "   banner_pos_rate  site_id_rate  site_domain_rate  site_category_rate  \\\n",
      "0         0.164272      0.118826           0.12275             0.12858   \n",
      "1         0.164272      0.118826           0.12275             0.12858   \n",
      "2         0.164272      0.118826           0.12275             0.12858   \n",
      "3         0.164272      0.118826           0.12275             0.12858   \n",
      "4         0.164272      0.118826           0.12275             0.12858   \n",
      "\n",
      "   app_id_rate  app_domain_rate  app_category_rate  device_id_rate  \n",
      "0     0.201129         0.201129           0.247588        0.174152  \n",
      "1     0.201129         0.201129           0.247588        0.174152  \n",
      "2     0.201129         0.201129           0.247588        0.174152  \n",
      "3     0.201129         0.201129           0.247588        0.174152  \n",
      "4     0.201129         0.201129           0.247588        0.174152  \n",
      "\n",
      "[5 rows x 24 columns]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  device_ip  avg(click)\n",
      "0  0000016d         0.0\n",
      "1  00000262         0.0\n",
      "2  00000911         0.0\n",
      "3  000009d4         0.0\n",
      "4  00000a61         1.0\n",
      "             id  click device_ip device_model  device_type  device_conn_type  \\\n",
      "0  1.005585e+19      0  7db30ee7     9f8d0424            1                 0   \n",
      "1  1.052369e+19      0  bbeeb866     fce66524            1                 0   \n",
      "2  2.566852e+18      0  74e435bd     5e12edef            1                 0   \n",
      "3  1.391783e+19      0  1ecc23fa     7fe9fa2c            1                 0   \n",
      "4  1.631158e+19      0  1e94c42f     405444ca            1                 0   \n",
      "\n",
      "     C14  C15  C16   C17     ...       C1_rate  banner_pos_rate  site_id_rate  \\\n",
      "0  20596  320   50  2161     ...      0.169331         0.164272      0.118826   \n",
      "1  20596  320   50  2161     ...      0.169331         0.164272      0.118826   \n",
      "2  18993  320   50  2161     ...      0.169331         0.164272      0.118826   \n",
      "3  20596  320   50  2161     ...      0.169331         0.164272      0.118826   \n",
      "4  21647  320   50  2487     ...      0.169331         0.164272      0.118826   \n",
      "\n",
      "   site_domain_rate  site_category_rate  app_id_rate  app_domain_rate  \\\n",
      "0           0.12275             0.12858     0.201129         0.201129   \n",
      "1           0.12275             0.12858     0.201129         0.201129   \n",
      "2           0.12275             0.12858     0.201129         0.201129   \n",
      "3           0.12275             0.12858     0.201129         0.201129   \n",
      "4           0.12275             0.12858     0.201129         0.201129   \n",
      "\n",
      "   app_category_rate  device_id_rate  avg(click)  \n",
      "0           0.247588        0.174152         0.0  \n",
      "1           0.247588        0.174152         0.0  \n",
      "2           0.247588        0.174152         0.0  \n",
      "3           0.247588        0.174152         0.0  \n",
      "4           0.247588        0.174152         0.0  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click device_ip device_model  device_type  device_conn_type  \\\n",
      "0  1.005585e+19      0  7db30ee7     9f8d0424            1                 0   \n",
      "1  1.052369e+19      0  bbeeb866     fce66524            1                 0   \n",
      "2  2.566852e+18      0  74e435bd     5e12edef            1                 0   \n",
      "3  1.391783e+19      0  1ecc23fa     7fe9fa2c            1                 0   \n",
      "4  1.631158e+19      0  1e94c42f     405444ca            1                 0   \n",
      "\n",
      "     C14  C15  C16   C17       ...         C1_rate  banner_pos_rate  \\\n",
      "0  20596  320   50  2161       ...        0.169331         0.164272   \n",
      "1  20596  320   50  2161       ...        0.169331         0.164272   \n",
      "2  18993  320   50  2161       ...        0.169331         0.164272   \n",
      "3  20596  320   50  2161       ...        0.169331         0.164272   \n",
      "4  21647  320   50  2487       ...        0.169331         0.164272   \n",
      "\n",
      "   site_id_rate  site_domain_rate  site_category_rate  app_id_rate  \\\n",
      "0      0.118826           0.12275             0.12858     0.201129   \n",
      "1      0.118826           0.12275             0.12858     0.201129   \n",
      "2      0.118826           0.12275             0.12858     0.201129   \n",
      "3      0.118826           0.12275             0.12858     0.201129   \n",
      "4      0.118826           0.12275             0.12858     0.201129   \n",
      "\n",
      "   app_domain_rate  app_category_rate  device_id_rate  device_ip_rate  \n",
      "0         0.201129           0.247588        0.174152             0.0  \n",
      "1         0.201129           0.247588        0.174152             0.0  \n",
      "2         0.201129           0.247588        0.174152             0.0  \n",
      "3         0.201129           0.247588        0.174152             0.0  \n",
      "4         0.201129           0.247588        0.174152             0.0  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click device_model  device_type  device_conn_type    C14  \\\n",
      "0  1.005585e+19      0     9f8d0424            1                 0  20596   \n",
      "1  1.052369e+19      0     fce66524            1                 0  20596   \n",
      "2  2.566852e+18      0     5e12edef            1                 0  18993   \n",
      "3  1.391783e+19      0     7fe9fa2c            1                 0  20596   \n",
      "4  1.631158e+19      0     405444ca            1                 0  21647   \n",
      "\n",
      "   C15  C16   C17  C18       ...         C1_rate  banner_pos_rate  \\\n",
      "0  320   50  2161    0       ...        0.169331         0.164272   \n",
      "1  320   50  2161    0       ...        0.169331         0.164272   \n",
      "2  320   50  2161    0       ...        0.169331         0.164272   \n",
      "3  320   50  2161    0       ...        0.169331         0.164272   \n",
      "4  320   50  2487    1       ...        0.169331         0.164272   \n",
      "\n",
      "   site_id_rate  site_domain_rate  site_category_rate  app_id_rate  \\\n",
      "0      0.118826           0.12275             0.12858     0.201129   \n",
      "1      0.118826           0.12275             0.12858     0.201129   \n",
      "2      0.118826           0.12275             0.12858     0.201129   \n",
      "3      0.118826           0.12275             0.12858     0.201129   \n",
      "4      0.118826           0.12275             0.12858     0.201129   \n",
      "\n",
      "   app_domain_rate  app_category_rate  device_id_rate  device_ip_rate  \n",
      "0         0.201129           0.247588        0.174152             0.0  \n",
      "1         0.201129           0.247588        0.174152             0.0  \n",
      "2         0.201129           0.247588        0.174152             0.0  \n",
      "3         0.201129           0.247588        0.174152             0.0  \n",
      "4         0.201129           0.247588        0.174152             0.0  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "  device_model  avg(click)\n",
      "0     00097428    0.233850\n",
      "1     0009f4d7    0.169109\n",
      "2     000ab70c    0.000000\n",
      "3     00161f51    0.394737\n",
      "4     002ee63d    0.000000\n",
      "             id  click device_model  device_type  device_conn_type    C14  \\\n",
      "0  1.005585e+19      0     9f8d0424            1                 0  20596   \n",
      "1  1.547673e+19      0     9f8d0424            4                 3  15704   \n",
      "2  9.076250e+16      1     9f8d0424            1                 0  20215   \n",
      "3  1.449938e+19      0     9f8d0424            1                 0  20362   \n",
      "4  1.352463e+19      0     9f8d0424            1                 0  20391   \n",
      "\n",
      "   C15  C16   C17  C18     ...      banner_pos_rate  site_id_rate  \\\n",
      "0  320   50  2161    0     ...             0.164272      0.118826   \n",
      "1  320   50  1722    0     ...             0.183614      0.118826   \n",
      "2  320   50  2316    0     ...             0.183614      0.118826   \n",
      "3  320   50  2333    0     ...             0.183614      0.118826   \n",
      "4  320   50  2340    3     ...             0.183614      0.118826   \n",
      "\n",
      "   site_domain_rate  site_category_rate  app_id_rate  app_domain_rate  \\\n",
      "0           0.12275             0.12858     0.201129         0.201129   \n",
      "1           0.12275             0.12858     0.057285         0.194877   \n",
      "2           0.12275             0.12858     0.183931         0.138114   \n",
      "3           0.12275             0.12858     0.183931         0.138114   \n",
      "4           0.12275             0.12858     0.183931         0.138114   \n",
      "\n",
      "   app_category_rate  device_id_rate  device_ip_rate  avg(click)  \n",
      "0           0.247588        0.174152        0.000000    0.152539  \n",
      "1           0.108118        0.000000        0.194765    0.152539  \n",
      "2           0.108118        0.174152        1.000000    0.152539  \n",
      "3           0.108118        0.174152        0.000000    0.152539  \n",
      "4           0.108118        0.174152        0.000000    0.152539  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click device_model  device_type  device_conn_type    C14  \\\n",
      "0  1.005585e+19      0     9f8d0424            1                 0  20596   \n",
      "1  1.547673e+19      0     9f8d0424            4                 3  15704   \n",
      "2  9.076250e+16      1     9f8d0424            1                 0  20215   \n",
      "3  1.449938e+19      0     9f8d0424            1                 0  20362   \n",
      "4  1.352463e+19      0     9f8d0424            1                 0  20391   \n",
      "\n",
      "   C15  C16   C17  C18        ...          banner_pos_rate  site_id_rate  \\\n",
      "0  320   50  2161    0        ...                 0.164272      0.118826   \n",
      "1  320   50  1722    0        ...                 0.183614      0.118826   \n",
      "2  320   50  2316    0        ...                 0.183614      0.118826   \n",
      "3  320   50  2333    0        ...                 0.183614      0.118826   \n",
      "4  320   50  2340    3        ...                 0.183614      0.118826   \n",
      "\n",
      "   site_domain_rate  site_category_rate  app_id_rate  app_domain_rate  \\\n",
      "0           0.12275             0.12858     0.201129         0.201129   \n",
      "1           0.12275             0.12858     0.057285         0.194877   \n",
      "2           0.12275             0.12858     0.183931         0.138114   \n",
      "3           0.12275             0.12858     0.183931         0.138114   \n",
      "4           0.12275             0.12858     0.183931         0.138114   \n",
      "\n",
      "   app_category_rate  device_id_rate  device_ip_rate  device_model_rate  \n",
      "0           0.247588        0.174152        0.000000           0.152539  \n",
      "1           0.108118        0.000000        0.194765           0.152539  \n",
      "2           0.108118        0.174152        1.000000           0.152539  \n",
      "3           0.108118        0.174152        0.000000           0.152539  \n",
      "4           0.108118        0.174152        0.000000           0.152539  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click  device_type  device_conn_type    C14  C15  C16   C17  \\\n",
      "0  1.005585e+19      0            1                 0  20596  320   50  2161   \n",
      "1  1.547673e+19      0            4                 3  15704  320   50  1722   \n",
      "2  9.076250e+16      1            1                 0  20215  320   50  2316   \n",
      "3  1.449938e+19      0            1                 0  20362  320   50  2333   \n",
      "4  1.352463e+19      0            1                 0  20391  320   50  2340   \n",
      "\n",
      "   C18   C19        ...          banner_pos_rate  site_id_rate  \\\n",
      "0    0    35        ...                 0.164272      0.118826   \n",
      "1    0    35        ...                 0.183614      0.118826   \n",
      "2    0   167        ...                 0.183614      0.118826   \n",
      "3    0    39        ...                 0.183614      0.118826   \n",
      "4    3  1065        ...                 0.183614      0.118826   \n",
      "\n",
      "   site_domain_rate  site_category_rate  app_id_rate  app_domain_rate  \\\n",
      "0           0.12275             0.12858     0.201129         0.201129   \n",
      "1           0.12275             0.12858     0.057285         0.194877   \n",
      "2           0.12275             0.12858     0.183931         0.138114   \n",
      "3           0.12275             0.12858     0.183931         0.138114   \n",
      "4           0.12275             0.12858     0.183931         0.138114   \n",
      "\n",
      "   app_category_rate  device_id_rate  device_ip_rate  device_model_rate  \n",
      "0           0.247588        0.174152        0.000000           0.152539  \n",
      "1           0.108118        0.000000        0.194765           0.152539  \n",
      "2           0.108118        0.174152        1.000000           0.152539  \n",
      "3           0.108118        0.174152        0.000000           0.152539  \n",
      "4           0.108118        0.174152        0.000000           0.152539  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "   device_type  avg(click)\n",
      "0            0    0.210731\n",
      "1            1    0.169176\n",
      "2            2    0.064516\n",
      "3            4    0.095444\n",
      "4            5    0.093842\n",
      "             id  click  device_type  device_conn_type    C14  C15  C16   C17  \\\n",
      "0  1.005585e+19      0            1                 0  20596  320   50  2161   \n",
      "1  9.076250e+16      1            1                 0  20215  320   50  2316   \n",
      "2  1.449938e+19      0            1                 0  20362  320   50  2333   \n",
      "3  1.352463e+19      0            1                 0  20391  320   50  2340   \n",
      "4  6.837967e+18      0            1                 0  20596  320   50  2161   \n",
      "\n",
      "   C18   C19     ...      site_id_rate  site_domain_rate  site_category_rate  \\\n",
      "0    0    35     ...          0.118826           0.12275             0.12858   \n",
      "1    0   167     ...          0.118826           0.12275             0.12858   \n",
      "2    0    39     ...          0.118826           0.12275             0.12858   \n",
      "3    3  1065     ...          0.118826           0.12275             0.12858   \n",
      "4    0    35     ...          0.118826           0.12275             0.12858   \n",
      "\n",
      "   app_id_rate  app_domain_rate  app_category_rate  device_id_rate  \\\n",
      "0     0.201129         0.201129           0.247588        0.174152   \n",
      "1     0.183931         0.138114           0.108118        0.174152   \n",
      "2     0.183931         0.138114           0.108118        0.174152   \n",
      "3     0.183931         0.138114           0.108118        0.174152   \n",
      "4     0.057178         0.194877           0.108118        0.174152   \n",
      "\n",
      "   device_ip_rate  device_model_rate  avg(click)  \n",
      "0             0.0           0.152539    0.169176  \n",
      "1             1.0           0.152539    0.169176  \n",
      "2             0.0           0.152539    0.169176  \n",
      "3             0.0           0.152539    0.169176  \n",
      "4             0.0           0.152539    0.169176  \n",
      "\n",
      "[5 rows x 25 columns]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "             id  click  device_type  device_conn_type    C14  C15  C16   C17  \\\n",
      "0  1.005585e+19      0            1                 0  20596  320   50  2161   \n",
      "1  9.076250e+16      1            1                 0  20215  320   50  2316   \n",
      "2  1.449938e+19      0            1                 0  20362  320   50  2333   \n",
      "3  1.352463e+19      0            1                 0  20391  320   50  2340   \n",
      "4  6.837967e+18      0            1                 0  20596  320   50  2161   \n",
      "\n",
      "   C18   C19        ...         site_id_rate  site_domain_rate  \\\n",
      "0    0    35        ...             0.118826           0.12275   \n",
      "1    0   167        ...             0.118826           0.12275   \n",
      "2    0    39        ...             0.118826           0.12275   \n",
      "3    3  1065        ...             0.118826           0.12275   \n",
      "4    0    35        ...             0.118826           0.12275   \n",
      "\n",
      "   site_category_rate  app_id_rate  app_domain_rate  app_category_rate  \\\n",
      "0             0.12858     0.201129         0.201129           0.247588   \n",
      "1             0.12858     0.183931         0.138114           0.108118   \n",
      "2             0.12858     0.183931         0.138114           0.108118   \n",
      "3             0.12858     0.183931         0.138114           0.108118   \n",
      "4             0.12858     0.057178         0.194877           0.108118   \n",
      "\n",
      "   device_id_rate  device_ip_rate  device_model_rate  device_type_rate  \n",
      "0        0.174152             0.0           0.152539          0.169176  \n",
      "1        0.174152             1.0           0.152539          0.169176  \n",
      "2        0.174152             0.0           0.152539          0.169176  \n",
      "3        0.174152             0.0           0.152539          0.169176  \n",
      "4        0.174152             0.0           0.152539          0.169176  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click  device_conn_type    C14  C15  C16   C17  C18   C19  \\\n",
      "0  1.005585e+19      0                 0  20596  320   50  2161    0    35   \n",
      "1  9.076250e+16      1                 0  20215  320   50  2316    0   167   \n",
      "2  1.449938e+19      0                 0  20362  320   50  2333    0    39   \n",
      "3  1.352463e+19      0                 0  20391  320   50  2340    3  1065   \n",
      "4  6.837967e+18      0                 0  20596  320   50  2161    0    35   \n",
      "\n",
      "      C20        ...         site_id_rate  site_domain_rate  \\\n",
      "0      -1        ...             0.118826           0.12275   \n",
      "1  100075        ...             0.118826           0.12275   \n",
      "2      -1        ...             0.118826           0.12275   \n",
      "3  100111        ...             0.118826           0.12275   \n",
      "4  100212        ...             0.118826           0.12275   \n",
      "\n",
      "   site_category_rate  app_id_rate  app_domain_rate  app_category_rate  \\\n",
      "0             0.12858     0.201129         0.201129           0.247588   \n",
      "1             0.12858     0.183931         0.138114           0.108118   \n",
      "2             0.12858     0.183931         0.138114           0.108118   \n",
      "3             0.12858     0.183931         0.138114           0.108118   \n",
      "4             0.12858     0.057178         0.194877           0.108118   \n",
      "\n",
      "   device_id_rate  device_ip_rate  device_model_rate  device_type_rate  \n",
      "0        0.174152             0.0           0.152539          0.169176  \n",
      "1        0.174152             1.0           0.152539          0.169176  \n",
      "2        0.174152             0.0           0.152539          0.169176  \n",
      "3        0.174152             0.0           0.152539          0.169176  \n",
      "4        0.174152             0.0           0.152539          0.169176  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "   device_conn_type  avg(click)\n",
      "0                 0    0.181125\n",
      "1                 2    0.135289\n",
      "2                 3    0.044043\n",
      "3                 5    0.029611\n",
      "             id  click  device_conn_type    C14  C15  C16   C17  C18   C19  \\\n",
      "0  1.005585e+19      0                 0  20596  320   50  2161    0    35   \n",
      "1  9.076250e+16      1                 0  20215  320   50  2316    0   167   \n",
      "2  1.449938e+19      0                 0  20362  320   50  2333    0    39   \n",
      "3  1.352463e+19      0                 0  20391  320   50  2340    3  1065   \n",
      "4  6.837967e+18      0                 0  20596  320   50  2161    0    35   \n",
      "\n",
      "      C20     ...      site_domain_rate  site_category_rate  app_id_rate  \\\n",
      "0      -1     ...               0.12275             0.12858     0.201129   \n",
      "1  100075     ...               0.12275             0.12858     0.183931   \n",
      "2      -1     ...               0.12275             0.12858     0.183931   \n",
      "3  100111     ...               0.12275             0.12858     0.183931   \n",
      "4  100212     ...               0.12275             0.12858     0.057178   \n",
      "\n",
      "   app_domain_rate  app_category_rate  device_id_rate  device_ip_rate  \\\n",
      "0         0.201129           0.247588        0.174152             0.0   \n",
      "1         0.138114           0.108118        0.174152             1.0   \n",
      "2         0.138114           0.108118        0.174152             0.0   \n",
      "3         0.138114           0.108118        0.174152             0.0   \n",
      "4         0.194877           0.108118        0.174152             0.0   \n",
      "\n",
      "   device_model_rate  device_type_rate  avg(click)  \n",
      "0           0.152539          0.169176    0.181125  \n",
      "1           0.152539          0.169176    0.181125  \n",
      "2           0.152539          0.169176    0.181125  \n",
      "3           0.152539          0.169176    0.181125  \n",
      "4           0.152539          0.169176    0.181125  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click  device_conn_type    C14  C15  C16   C17  C18   C19  \\\n",
      "0  1.005585e+19      0                 0  20596  320   50  2161    0    35   \n",
      "1  9.076250e+16      1                 0  20215  320   50  2316    0   167   \n",
      "2  1.449938e+19      0                 0  20362  320   50  2333    0    39   \n",
      "3  1.352463e+19      0                 0  20391  320   50  2340    3  1065   \n",
      "4  6.837967e+18      0                 0  20596  320   50  2161    0    35   \n",
      "\n",
      "      C20          ...            site_domain_rate  site_category_rate  \\\n",
      "0      -1          ...                     0.12275             0.12858   \n",
      "1  100075          ...                     0.12275             0.12858   \n",
      "2      -1          ...                     0.12275             0.12858   \n",
      "3  100111          ...                     0.12275             0.12858   \n",
      "4  100212          ...                     0.12275             0.12858   \n",
      "\n",
      "   app_id_rate  app_domain_rate  app_category_rate  device_id_rate  \\\n",
      "0     0.201129         0.201129           0.247588        0.174152   \n",
      "1     0.183931         0.138114           0.108118        0.174152   \n",
      "2     0.183931         0.138114           0.108118        0.174152   \n",
      "3     0.183931         0.138114           0.108118        0.174152   \n",
      "4     0.057178         0.194877           0.108118        0.174152   \n",
      "\n",
      "   device_ip_rate  device_model_rate  device_type_rate  device_conn_type_rate  \n",
      "0             0.0           0.152539          0.169176               0.181125  \n",
      "1             1.0           0.152539          0.169176               0.181125  \n",
      "2             0.0           0.152539          0.169176               0.181125  \n",
      "3             0.0           0.152539          0.169176               0.181125  \n",
      "4             0.0           0.152539          0.169176               0.181125  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click    C14  C15  C16   C17  C18   C19     C20  C21  \\\n",
      "0  1.005585e+19      0  20596  320   50  2161    0    35      -1  157   \n",
      "1  9.076250e+16      1  20215  320   50  2316    0   167  100075   16   \n",
      "2  1.449938e+19      0  20362  320   50  2333    0    39      -1  157   \n",
      "3  1.352463e+19      0  20391  320   50  2340    3  1065  100111  159   \n",
      "4  6.837967e+18      0  20596  320   50  2161    0    35  100212  157   \n",
      "\n",
      "           ...            site_domain_rate  site_category_rate  app_id_rate  \\\n",
      "0          ...                     0.12275             0.12858     0.201129   \n",
      "1          ...                     0.12275             0.12858     0.183931   \n",
      "2          ...                     0.12275             0.12858     0.183931   \n",
      "3          ...                     0.12275             0.12858     0.183931   \n",
      "4          ...                     0.12275             0.12858     0.057178   \n",
      "\n",
      "   app_domain_rate  app_category_rate  device_id_rate  device_ip_rate  \\\n",
      "0         0.201129           0.247588        0.174152             0.0   \n",
      "1         0.138114           0.108118        0.174152             1.0   \n",
      "2         0.138114           0.108118        0.174152             0.0   \n",
      "3         0.138114           0.108118        0.174152             0.0   \n",
      "4         0.194877           0.108118        0.174152             0.0   \n",
      "\n",
      "   device_model_rate  device_type_rate  device_conn_type_rate  \n",
      "0           0.152539          0.169176               0.181125  \n",
      "1           0.152539          0.169176               0.181125  \n",
      "2           0.152539          0.169176               0.181125  \n",
      "3           0.152539          0.169176               0.181125  \n",
      "4           0.152539          0.169176               0.181125  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "   C14  avg(click)\n",
      "0  375    0.230891\n",
      "1  376    0.277778\n",
      "2  377    0.213909\n",
      "3  380    0.201186\n",
      "4  381    0.225704\n",
      "             id  click    C14  C15  C16   C17  C18  C19     C20  C21  \\\n",
      "0  1.005585e+19      0  20596  320   50  2161    0   35      -1  157   \n",
      "1  6.837967e+18      0  20596  320   50  2161    0   35  100212  157   \n",
      "2  1.052369e+19      0  20596  320   50  2161    0   35  100148  157   \n",
      "3  1.391783e+19      0  20596  320   50  2161    0   35  100034  157   \n",
      "4  1.503958e+19      0  20596  320   50  2161    0   35      -1  157   \n",
      "\n",
      "      ...      site_category_rate  app_id_rate  app_domain_rate  \\\n",
      "0     ...                 0.12858     0.201129         0.201129   \n",
      "1     ...                 0.12858     0.057178         0.194877   \n",
      "2     ...                 0.12858     0.201129         0.201129   \n",
      "3     ...                 0.12858     0.201129         0.201129   \n",
      "4     ...                 0.12858     0.201129         0.201129   \n",
      "\n",
      "   app_category_rate  device_id_rate  device_ip_rate  device_model_rate  \\\n",
      "0           0.247588        0.174152             0.0           0.152539   \n",
      "1           0.108118        0.174152             0.0           0.152539   \n",
      "2           0.247588        0.174152             0.0           0.267091   \n",
      "3           0.247588        0.174152             0.0           0.308108   \n",
      "4           0.247588        0.174152             0.0           0.108599   \n",
      "\n",
      "   device_type_rate  device_conn_type_rate  avg(click)  \n",
      "0          0.169176               0.181125    0.079519  \n",
      "1          0.169176               0.181125    0.079519  \n",
      "2          0.169176               0.181125    0.079519  \n",
      "3          0.169176               0.181125    0.079519  \n",
      "4          0.169176               0.181125    0.079519  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click    C14  C15  C16   C17  C18  C19     C20  C21  \\\n",
      "0  1.005585e+19      0  20596  320   50  2161    0   35      -1  157   \n",
      "1  6.837967e+18      0  20596  320   50  2161    0   35  100212  157   \n",
      "2  1.052369e+19      0  20596  320   50  2161    0   35  100148  157   \n",
      "3  1.391783e+19      0  20596  320   50  2161    0   35  100034  157   \n",
      "4  1.503958e+19      0  20596  320   50  2161    0   35      -1  157   \n",
      "\n",
      "     ...     site_category_rate  app_id_rate  app_domain_rate  \\\n",
      "0    ...                0.12858     0.201129         0.201129   \n",
      "1    ...                0.12858     0.057178         0.194877   \n",
      "2    ...                0.12858     0.201129         0.201129   \n",
      "3    ...                0.12858     0.201129         0.201129   \n",
      "4    ...                0.12858     0.201129         0.201129   \n",
      "\n",
      "   app_category_rate  device_id_rate  device_ip_rate  device_model_rate  \\\n",
      "0           0.247588        0.174152             0.0           0.152539   \n",
      "1           0.108118        0.174152             0.0           0.152539   \n",
      "2           0.247588        0.174152             0.0           0.267091   \n",
      "3           0.247588        0.174152             0.0           0.308108   \n",
      "4           0.247588        0.174152             0.0           0.108599   \n",
      "\n",
      "   device_type_rate  device_conn_type_rate  C14_rate  \n",
      "0          0.169176               0.181125  0.079519  \n",
      "1          0.169176               0.181125  0.079519  \n",
      "2          0.169176               0.181125  0.079519  \n",
      "3          0.169176               0.181125  0.079519  \n",
      "4          0.169176               0.181125  0.079519  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click  C15  C16   C17  C18  C19     C20  C21  hour_rate  \\\n",
      "0  1.005585e+19      0  320   50  2161    0   35      -1  157   0.174714   \n",
      "1  6.837967e+18      0  320   50  2161    0   35  100212  157   0.173695   \n",
      "2  1.052369e+19      0  320   50  2161    0   35  100148  157   0.174714   \n",
      "3  1.391783e+19      0  320   50  2161    0   35  100034  157   0.173695   \n",
      "4  1.503958e+19      0  320   50  2161    0   35      -1  157   0.150696   \n",
      "\n",
      "     ...     site_category_rate  app_id_rate  app_domain_rate  \\\n",
      "0    ...                0.12858     0.201129         0.201129   \n",
      "1    ...                0.12858     0.057178         0.194877   \n",
      "2    ...                0.12858     0.201129         0.201129   \n",
      "3    ...                0.12858     0.201129         0.201129   \n",
      "4    ...                0.12858     0.201129         0.201129   \n",
      "\n",
      "   app_category_rate  device_id_rate  device_ip_rate  device_model_rate  \\\n",
      "0           0.247588        0.174152             0.0           0.152539   \n",
      "1           0.108118        0.174152             0.0           0.152539   \n",
      "2           0.247588        0.174152             0.0           0.267091   \n",
      "3           0.247588        0.174152             0.0           0.308108   \n",
      "4           0.247588        0.174152             0.0           0.108599   \n",
      "\n",
      "   device_type_rate  device_conn_type_rate  C14_rate  \n",
      "0          0.169176               0.181125  0.079519  \n",
      "1          0.169176               0.181125  0.079519  \n",
      "2          0.169176               0.181125  0.079519  \n",
      "3          0.169176               0.181125  0.079519  \n",
      "4          0.169176               0.181125  0.079519  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "   C15  avg(click)\n",
      "0  120    0.018899\n",
      "1  216    0.127138\n",
      "2  300    0.359358\n",
      "3  320    0.158608\n",
      "4  480    0.267665\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "             id  click  C15  C16   C17  C18  C19     C20  C21  hour_rate  \\\n",
      "0  1.005585e+19      0  320   50  2161    0   35      -1  157   0.174714   \n",
      "1  6.837967e+18      0  320   50  2161    0   35  100212  157   0.173695   \n",
      "2  1.052369e+19      0  320   50  2161    0   35  100148  157   0.174714   \n",
      "3  1.391783e+19      0  320   50  2161    0   35  100034  157   0.173695   \n",
      "4  1.503958e+19      0  320   50  2161    0   35      -1  157   0.150696   \n",
      "\n",
      "      ...      app_id_rate  app_domain_rate  app_category_rate  \\\n",
      "0     ...         0.201129         0.201129           0.247588   \n",
      "1     ...         0.057178         0.194877           0.108118   \n",
      "2     ...         0.201129         0.201129           0.247588   \n",
      "3     ...         0.201129         0.201129           0.247588   \n",
      "4     ...         0.201129         0.201129           0.247588   \n",
      "\n",
      "   device_id_rate  device_ip_rate  device_model_rate  device_type_rate  \\\n",
      "0        0.174152             0.0           0.152539          0.169176   \n",
      "1        0.174152             0.0           0.152539          0.169176   \n",
      "2        0.174152             0.0           0.267091          0.169176   \n",
      "3        0.174152             0.0           0.308108          0.169176   \n",
      "4        0.174152             0.0           0.108599          0.169176   \n",
      "\n",
      "   device_conn_type_rate  C14_rate  avg(click)  \n",
      "0               0.181125  0.079519    0.158608  \n",
      "1               0.181125  0.079519    0.158608  \n",
      "2               0.181125  0.079519    0.158608  \n",
      "3               0.181125  0.079519    0.158608  \n",
      "4               0.181125  0.079519    0.158608  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click  C15  C16   C17  C18  C19     C20  C21  hour_rate  \\\n",
      "0  1.005585e+19      0  320   50  2161    0   35      -1  157   0.174714   \n",
      "1  6.837967e+18      0  320   50  2161    0   35  100212  157   0.173695   \n",
      "2  1.052369e+19      0  320   50  2161    0   35  100148  157   0.174714   \n",
      "3  1.391783e+19      0  320   50  2161    0   35  100034  157   0.173695   \n",
      "4  1.503958e+19      0  320   50  2161    0   35      -1  157   0.150696   \n",
      "\n",
      "     ...     app_id_rate  app_domain_rate  app_category_rate  device_id_rate  \\\n",
      "0    ...        0.201129         0.201129           0.247588        0.174152   \n",
      "1    ...        0.057178         0.194877           0.108118        0.174152   \n",
      "2    ...        0.201129         0.201129           0.247588        0.174152   \n",
      "3    ...        0.201129         0.201129           0.247588        0.174152   \n",
      "4    ...        0.201129         0.201129           0.247588        0.174152   \n",
      "\n",
      "   device_ip_rate  device_model_rate  device_type_rate  device_conn_type_rate  \\\n",
      "0             0.0           0.152539          0.169176               0.181125   \n",
      "1             0.0           0.152539          0.169176               0.181125   \n",
      "2             0.0           0.267091          0.169176               0.181125   \n",
      "3             0.0           0.308108          0.169176               0.181125   \n",
      "4             0.0           0.108599          0.169176               0.181125   \n",
      "\n",
      "   C14_rate  C15_rate  \n",
      "0  0.079519  0.158608  \n",
      "1  0.079519  0.158608  \n",
      "2  0.079519  0.158608  \n",
      "3  0.079519  0.158608  \n",
      "4  0.079519  0.158608  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click  C16   C17  C18  C19     C20  C21  hour_rate   C1_rate  \\\n",
      "0  1.005585e+19      0   50  2161    0   35      -1  157   0.174714  0.169331   \n",
      "1  6.837967e+18      0   50  2161    0   35  100212  157   0.173695  0.169331   \n",
      "2  1.052369e+19      0   50  2161    0   35  100148  157   0.174714  0.169331   \n",
      "3  1.391783e+19      0   50  2161    0   35  100034  157   0.173695  0.169331   \n",
      "4  1.503958e+19      0   50  2161    0   35      -1  157   0.150696  0.169331   \n",
      "\n",
      "     ...     app_id_rate  app_domain_rate  app_category_rate  device_id_rate  \\\n",
      "0    ...        0.201129         0.201129           0.247588        0.174152   \n",
      "1    ...        0.057178         0.194877           0.108118        0.174152   \n",
      "2    ...        0.201129         0.201129           0.247588        0.174152   \n",
      "3    ...        0.201129         0.201129           0.247588        0.174152   \n",
      "4    ...        0.201129         0.201129           0.247588        0.174152   \n",
      "\n",
      "   device_ip_rate  device_model_rate  device_type_rate  device_conn_type_rate  \\\n",
      "0             0.0           0.152539          0.169176               0.181125   \n",
      "1             0.0           0.152539          0.169176               0.181125   \n",
      "2             0.0           0.267091          0.169176               0.181125   \n",
      "3             0.0           0.308108          0.169176               0.181125   \n",
      "4             0.0           0.108599          0.169176               0.181125   \n",
      "\n",
      "   C14_rate  C15_rate  \n",
      "0  0.079519  0.158608  \n",
      "1  0.079519  0.158608  \n",
      "2  0.079519  0.158608  \n",
      "3  0.079519  0.158608  \n",
      "4  0.079519  0.158608  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "   C16  avg(click)\n",
      "0   20    0.018899\n",
      "1   36    0.127138\n",
      "2   50    0.158315\n",
      "3   90    0.056968\n",
      "4  250    0.421347\n",
      "             id  click  C16   C17  C18  C19     C20  C21  hour_rate   C1_rate  \\\n",
      "0  1.005585e+19      0   50  2161    0   35      -1  157   0.174714  0.169331   \n",
      "1  6.837967e+18      0   50  2161    0   35  100212  157   0.173695  0.169331   \n",
      "2  1.052369e+19      0   50  2161    0   35  100148  157   0.174714  0.169331   \n",
      "3  1.391783e+19      0   50  2161    0   35  100034  157   0.173695  0.169331   \n",
      "4  1.503958e+19      0   50  2161    0   35      -1  157   0.150696  0.169331   \n",
      "\n",
      "      ...      app_domain_rate  app_category_rate  device_id_rate  \\\n",
      "0     ...             0.201129           0.247588        0.174152   \n",
      "1     ...             0.194877           0.108118        0.174152   \n",
      "2     ...             0.201129           0.247588        0.174152   \n",
      "3     ...             0.201129           0.247588        0.174152   \n",
      "4     ...             0.201129           0.247588        0.174152   \n",
      "\n",
      "   device_ip_rate  device_model_rate  device_type_rate  device_conn_type_rate  \\\n",
      "0             0.0           0.152539          0.169176               0.181125   \n",
      "1             0.0           0.152539          0.169176               0.181125   \n",
      "2             0.0           0.267091          0.169176               0.181125   \n",
      "3             0.0           0.308108          0.169176               0.181125   \n",
      "4             0.0           0.108599          0.169176               0.181125   \n",
      "\n",
      "   C14_rate  C15_rate  avg(click)  \n",
      "0  0.079519  0.158608    0.158315  \n",
      "1  0.079519  0.158608    0.158315  \n",
      "2  0.079519  0.158608    0.158315  \n",
      "3  0.079519  0.158608    0.158315  \n",
      "4  0.079519  0.158608    0.158315  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click  C16   C17  C18  C19     C20  C21  hour_rate   C1_rate  \\\n",
      "0  1.005585e+19      0   50  2161    0   35      -1  157   0.174714  0.169331   \n",
      "1  6.837967e+18      0   50  2161    0   35  100212  157   0.173695  0.169331   \n",
      "2  1.052369e+19      0   50  2161    0   35  100148  157   0.174714  0.169331   \n",
      "3  1.391783e+19      0   50  2161    0   35  100034  157   0.173695  0.169331   \n",
      "4  1.503958e+19      0   50  2161    0   35      -1  157   0.150696  0.169331   \n",
      "\n",
      "     ...     app_domain_rate  app_category_rate  device_id_rate  \\\n",
      "0    ...            0.201129           0.247588        0.174152   \n",
      "1    ...            0.194877           0.108118        0.174152   \n",
      "2    ...            0.201129           0.247588        0.174152   \n",
      "3    ...            0.201129           0.247588        0.174152   \n",
      "4    ...            0.201129           0.247588        0.174152   \n",
      "\n",
      "   device_ip_rate  device_model_rate  device_type_rate  device_conn_type_rate  \\\n",
      "0             0.0           0.152539          0.169176               0.181125   \n",
      "1             0.0           0.152539          0.169176               0.181125   \n",
      "2             0.0           0.267091          0.169176               0.181125   \n",
      "3             0.0           0.308108          0.169176               0.181125   \n",
      "4             0.0           0.108599          0.169176               0.181125   \n",
      "\n",
      "   C14_rate  C15_rate  C16_rate  \n",
      "0  0.079519  0.158608  0.158315  \n",
      "1  0.079519  0.158608  0.158315  \n",
      "2  0.079519  0.158608  0.158315  \n",
      "3  0.079519  0.158608  0.158315  \n",
      "4  0.079519  0.158608  0.158315  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click   C17  C18  C19     C20  C21  hour_rate   C1_rate  \\\n",
      "0  1.005585e+19      0  2161    0   35      -1  157   0.174714  0.169331   \n",
      "1  6.837967e+18      0  2161    0   35  100212  157   0.173695  0.169331   \n",
      "2  1.052369e+19      0  2161    0   35  100148  157   0.174714  0.169331   \n",
      "3  1.391783e+19      0  2161    0   35  100034  157   0.173695  0.169331   \n",
      "4  1.503958e+19      0  2161    0   35      -1  157   0.150696  0.169331   \n",
      "\n",
      "   banner_pos_rate    ...     app_domain_rate  app_category_rate  \\\n",
      "0         0.164272    ...            0.201129           0.247588   \n",
      "1         0.164272    ...            0.194877           0.108118   \n",
      "2         0.164272    ...            0.201129           0.247588   \n",
      "3         0.164272    ...            0.201129           0.247588   \n",
      "4         0.164272    ...            0.201129           0.247588   \n",
      "\n",
      "   device_id_rate  device_ip_rate  device_model_rate  device_type_rate  \\\n",
      "0        0.174152             0.0           0.152539          0.169176   \n",
      "1        0.174152             0.0           0.152539          0.169176   \n",
      "2        0.174152             0.0           0.267091          0.169176   \n",
      "3        0.174152             0.0           0.308108          0.169176   \n",
      "4        0.174152             0.0           0.108599          0.169176   \n",
      "\n",
      "   device_conn_type_rate  C14_rate  C15_rate  C16_rate  \n",
      "0               0.181125  0.079519  0.158608  0.158315  \n",
      "1               0.181125  0.079519  0.158608  0.158315  \n",
      "2               0.181125  0.079519  0.158608  0.158315  \n",
      "3               0.181125  0.079519  0.158608  0.158315  \n",
      "4               0.181125  0.079519  0.158608  0.158315  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "   C17  avg(click)\n",
      "0  112    0.216550\n",
      "1  122    0.138898\n",
      "2  153    0.090086\n",
      "3  178    0.273055\n",
      "4  196    0.243277\n",
      "             id  click   C17  C18  C19     C20  C21  hour_rate   C1_rate  \\\n",
      "0  1.005585e+19      0  2161    0   35      -1  157   0.174714  0.169331   \n",
      "1  6.837967e+18      0  2161    0   35  100212  157   0.173695  0.169331   \n",
      "2  1.052369e+19      0  2161    0   35  100148  157   0.174714  0.169331   \n",
      "3  1.391783e+19      0  2161    0   35  100034  157   0.173695  0.169331   \n",
      "4  1.503958e+19      0  2161    0   35      -1  157   0.150696  0.169331   \n",
      "\n",
      "   banner_pos_rate     ...      app_category_rate  device_id_rate  \\\n",
      "0         0.164272     ...               0.247588        0.174152   \n",
      "1         0.164272     ...               0.108118        0.174152   \n",
      "2         0.164272     ...               0.247588        0.174152   \n",
      "3         0.164272     ...               0.247588        0.174152   \n",
      "4         0.164272     ...               0.247588        0.174152   \n",
      "\n",
      "   device_ip_rate  device_model_rate  device_type_rate  device_conn_type_rate  \\\n",
      "0             0.0           0.152539          0.169176               0.181125   \n",
      "1             0.0           0.152539          0.169176               0.181125   \n",
      "2             0.0           0.267091          0.169176               0.181125   \n",
      "3             0.0           0.308108          0.169176               0.181125   \n",
      "4             0.0           0.108599          0.169176               0.181125   \n",
      "\n",
      "   C14_rate  C15_rate  C16_rate  avg(click)  \n",
      "0  0.079519  0.158608  0.158315    0.083492  \n",
      "1  0.079519  0.158608  0.158315    0.083492  \n",
      "2  0.079519  0.158608  0.158315    0.083492  \n",
      "3  0.079519  0.158608  0.158315    0.083492  \n",
      "4  0.079519  0.158608  0.158315    0.083492  \n",
      "\n",
      "[5 rows x 25 columns]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "             id  click   C17  C18  C19     C20  C21  hour_rate   C1_rate  \\\n",
      "0  1.005585e+19      0  2161    0   35      -1  157   0.174714  0.169331   \n",
      "1  6.837967e+18      0  2161    0   35  100212  157   0.173695  0.169331   \n",
      "2  1.052369e+19      0  2161    0   35  100148  157   0.174714  0.169331   \n",
      "3  1.391783e+19      0  2161    0   35  100034  157   0.173695  0.169331   \n",
      "4  1.503958e+19      0  2161    0   35      -1  157   0.150696  0.169331   \n",
      "\n",
      "   banner_pos_rate    ...     app_category_rate  device_id_rate  \\\n",
      "0         0.164272    ...              0.247588        0.174152   \n",
      "1         0.164272    ...              0.108118        0.174152   \n",
      "2         0.164272    ...              0.247588        0.174152   \n",
      "3         0.164272    ...              0.247588        0.174152   \n",
      "4         0.164272    ...              0.247588        0.174152   \n",
      "\n",
      "   device_ip_rate  device_model_rate  device_type_rate  device_conn_type_rate  \\\n",
      "0             0.0           0.152539          0.169176               0.181125   \n",
      "1             0.0           0.152539          0.169176               0.181125   \n",
      "2             0.0           0.267091          0.169176               0.181125   \n",
      "3             0.0           0.308108          0.169176               0.181125   \n",
      "4             0.0           0.108599          0.169176               0.181125   \n",
      "\n",
      "   C14_rate  C15_rate  C16_rate  C17_rate  \n",
      "0  0.079519  0.158608  0.158315  0.083492  \n",
      "1  0.079519  0.158608  0.158315  0.083492  \n",
      "2  0.079519  0.158608  0.158315  0.083492  \n",
      "3  0.079519  0.158608  0.158315  0.083492  \n",
      "4  0.079519  0.158608  0.158315  0.083492  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click  C18  C19     C20  C21  hour_rate   C1_rate  \\\n",
      "0  1.005585e+19      0    0   35      -1  157   0.174714  0.169331   \n",
      "1  6.837967e+18      0    0   35  100212  157   0.173695  0.169331   \n",
      "2  1.052369e+19      0    0   35  100148  157   0.174714  0.169331   \n",
      "3  1.391783e+19      0    0   35  100034  157   0.173695  0.169331   \n",
      "4  1.503958e+19      0    0   35      -1  157   0.150696  0.169331   \n",
      "\n",
      "   banner_pos_rate  site_id_rate    ...     app_category_rate  device_id_rate  \\\n",
      "0         0.164272      0.118826    ...              0.247588        0.174152   \n",
      "1         0.164272      0.118826    ...              0.108118        0.174152   \n",
      "2         0.164272      0.118826    ...              0.247588        0.174152   \n",
      "3         0.164272      0.118826    ...              0.247588        0.174152   \n",
      "4         0.164272      0.118826    ...              0.247588        0.174152   \n",
      "\n",
      "   device_ip_rate  device_model_rate  device_type_rate  device_conn_type_rate  \\\n",
      "0             0.0           0.152539          0.169176               0.181125   \n",
      "1             0.0           0.152539          0.169176               0.181125   \n",
      "2             0.0           0.267091          0.169176               0.181125   \n",
      "3             0.0           0.308108          0.169176               0.181125   \n",
      "4             0.0           0.108599          0.169176               0.181125   \n",
      "\n",
      "   C14_rate  C15_rate  C16_rate  C17_rate  \n",
      "0  0.079519  0.158608  0.158315  0.083492  \n",
      "1  0.079519  0.158608  0.158315  0.083492  \n",
      "2  0.079519  0.158608  0.158315  0.083492  \n",
      "3  0.079519  0.158608  0.158315  0.083492  \n",
      "4  0.079519  0.158608  0.158315  0.083492  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "   C18  avg(click)\n",
      "0    0    0.158225\n",
      "1    1    0.034176\n",
      "2    2    0.294976\n",
      "3    3    0.145953\n",
      "             id  click  C18  C19     C20  C21  hour_rate   C1_rate  \\\n",
      "0  1.005585e+19      0    0   35      -1  157   0.174714  0.169331   \n",
      "1  6.837967e+18      0    0   35  100212  157   0.173695  0.169331   \n",
      "2  1.052369e+19      0    0   35  100148  157   0.174714  0.169331   \n",
      "3  1.391783e+19      0    0   35  100034  157   0.173695  0.169331   \n",
      "4  1.503958e+19      0    0   35      -1  157   0.150696  0.169331   \n",
      "\n",
      "   banner_pos_rate  site_id_rate     ...      device_id_rate  device_ip_rate  \\\n",
      "0         0.164272      0.118826     ...            0.174152             0.0   \n",
      "1         0.164272      0.118826     ...            0.174152             0.0   \n",
      "2         0.164272      0.118826     ...            0.174152             0.0   \n",
      "3         0.164272      0.118826     ...            0.174152             0.0   \n",
      "4         0.164272      0.118826     ...            0.174152             0.0   \n",
      "\n",
      "   device_model_rate  device_type_rate  device_conn_type_rate  C14_rate  \\\n",
      "0           0.152539          0.169176               0.181125  0.079519   \n",
      "1           0.152539          0.169176               0.181125  0.079519   \n",
      "2           0.267091          0.169176               0.181125  0.079519   \n",
      "3           0.308108          0.169176               0.181125  0.079519   \n",
      "4           0.108599          0.169176               0.181125  0.079519   \n",
      "\n",
      "   C15_rate  C16_rate  C17_rate  avg(click)  \n",
      "0  0.158608  0.158315  0.083492    0.158225  \n",
      "1  0.158608  0.158315  0.083492    0.158225  \n",
      "2  0.158608  0.158315  0.083492    0.158225  \n",
      "3  0.158608  0.158315  0.083492    0.158225  \n",
      "4  0.158608  0.158315  0.083492    0.158225  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click  C18  C19     C20  C21  hour_rate   C1_rate  \\\n",
      "0  1.005585e+19      0    0   35      -1  157   0.174714  0.169331   \n",
      "1  6.837967e+18      0    0   35  100212  157   0.173695  0.169331   \n",
      "2  1.052369e+19      0    0   35  100148  157   0.174714  0.169331   \n",
      "3  1.391783e+19      0    0   35  100034  157   0.173695  0.169331   \n",
      "4  1.503958e+19      0    0   35      -1  157   0.150696  0.169331   \n",
      "\n",
      "   banner_pos_rate  site_id_rate    ...     device_id_rate  device_ip_rate  \\\n",
      "0         0.164272      0.118826    ...           0.174152             0.0   \n",
      "1         0.164272      0.118826    ...           0.174152             0.0   \n",
      "2         0.164272      0.118826    ...           0.174152             0.0   \n",
      "3         0.164272      0.118826    ...           0.174152             0.0   \n",
      "4         0.164272      0.118826    ...           0.174152             0.0   \n",
      "\n",
      "   device_model_rate  device_type_rate  device_conn_type_rate  C14_rate  \\\n",
      "0           0.152539          0.169176               0.181125  0.079519   \n",
      "1           0.152539          0.169176               0.181125  0.079519   \n",
      "2           0.267091          0.169176               0.181125  0.079519   \n",
      "3           0.308108          0.169176               0.181125  0.079519   \n",
      "4           0.108599          0.169176               0.181125  0.079519   \n",
      "\n",
      "   C15_rate  C16_rate  C17_rate  C18_rate  \n",
      "0  0.158608  0.158315  0.083492  0.158225  \n",
      "1  0.158608  0.158315  0.083492  0.158225  \n",
      "2  0.158608  0.158315  0.083492  0.158225  \n",
      "3  0.158608  0.158315  0.083492  0.158225  \n",
      "4  0.158608  0.158315  0.083492  0.158225  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click  C19     C20  C21  hour_rate   C1_rate  \\\n",
      "0  1.005585e+19      0   35      -1  157   0.174714  0.169331   \n",
      "1  6.837967e+18      0   35  100212  157   0.173695  0.169331   \n",
      "2  1.052369e+19      0   35  100148  157   0.174714  0.169331   \n",
      "3  1.391783e+19      0   35  100034  157   0.173695  0.169331   \n",
      "4  1.503958e+19      0   35      -1  157   0.150696  0.169331   \n",
      "\n",
      "   banner_pos_rate  site_id_rate  site_domain_rate    ...     device_id_rate  \\\n",
      "0         0.164272      0.118826           0.12275    ...           0.174152   \n",
      "1         0.164272      0.118826           0.12275    ...           0.174152   \n",
      "2         0.164272      0.118826           0.12275    ...           0.174152   \n",
      "3         0.164272      0.118826           0.12275    ...           0.174152   \n",
      "4         0.164272      0.118826           0.12275    ...           0.174152   \n",
      "\n",
      "   device_ip_rate  device_model_rate  device_type_rate  device_conn_type_rate  \\\n",
      "0             0.0           0.152539          0.169176               0.181125   \n",
      "1             0.0           0.152539          0.169176               0.181125   \n",
      "2             0.0           0.267091          0.169176               0.181125   \n",
      "3             0.0           0.308108          0.169176               0.181125   \n",
      "4             0.0           0.108599          0.169176               0.181125   \n",
      "\n",
      "   C14_rate  C15_rate  C16_rate  C17_rate  C18_rate  \n",
      "0  0.079519  0.158608  0.158315  0.083492  0.158225  \n",
      "1  0.079519  0.158608  0.158315  0.083492  0.158225  \n",
      "2  0.079519  0.158608  0.158315  0.083492  0.158225  \n",
      "3  0.079519  0.158608  0.158315  0.083492  0.158225  \n",
      "4  0.079519  0.158608  0.158315  0.083492  0.158225  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "   C19  avg(click)\n",
      "0   33    0.061630\n",
      "1   34    0.137013\n",
      "2   35    0.166803\n",
      "3   38    0.174400\n",
      "4   39    0.244163\n",
      "             id  click  C19     C20  C21  hour_rate   C1_rate  \\\n",
      "0  1.005585e+19      0   35      -1  157   0.174714  0.169331   \n",
      "1  6.837967e+18      0   35  100212  157   0.173695  0.169331   \n",
      "2  1.052369e+19      0   35  100148  157   0.174714  0.169331   \n",
      "3  1.391783e+19      0   35  100034  157   0.173695  0.169331   \n",
      "4  1.503958e+19      0   35      -1  157   0.150696  0.169331   \n",
      "\n",
      "   banner_pos_rate  site_id_rate  site_domain_rate     ...      \\\n",
      "0         0.164272      0.118826           0.12275     ...       \n",
      "1         0.164272      0.118826           0.12275     ...       \n",
      "2         0.164272      0.118826           0.12275     ...       \n",
      "3         0.164272      0.118826           0.12275     ...       \n",
      "4         0.164272      0.118826           0.12275     ...       \n",
      "\n",
      "   device_ip_rate  device_model_rate  device_type_rate  device_conn_type_rate  \\\n",
      "0             0.0           0.152539          0.169176               0.181125   \n",
      "1             0.0           0.152539          0.169176               0.181125   \n",
      "2             0.0           0.267091          0.169176               0.181125   \n",
      "3             0.0           0.308108          0.169176               0.181125   \n",
      "4             0.0           0.108599          0.169176               0.181125   \n",
      "\n",
      "   C14_rate  C15_rate  C16_rate  C17_rate  C18_rate  avg(click)  \n",
      "0  0.079519  0.158608  0.158315  0.083492  0.158225    0.166803  \n",
      "1  0.079519  0.158608  0.158315  0.083492  0.158225    0.166803  \n",
      "2  0.079519  0.158608  0.158315  0.083492  0.158225    0.166803  \n",
      "3  0.079519  0.158608  0.158315  0.083492  0.158225    0.166803  \n",
      "4  0.079519  0.158608  0.158315  0.083492  0.158225    0.166803  \n",
      "\n",
      "[5 rows x 25 columns]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "             id  click  C19     C20  C21  hour_rate   C1_rate  \\\n",
      "0  1.005585e+19      0   35      -1  157   0.174714  0.169331   \n",
      "1  6.837967e+18      0   35  100212  157   0.173695  0.169331   \n",
      "2  1.052369e+19      0   35  100148  157   0.174714  0.169331   \n",
      "3  1.391783e+19      0   35  100034  157   0.173695  0.169331   \n",
      "4  1.503958e+19      0   35      -1  157   0.150696  0.169331   \n",
      "\n",
      "   banner_pos_rate  site_id_rate  site_domain_rate    ...     device_ip_rate  \\\n",
      "0         0.164272      0.118826           0.12275    ...                0.0   \n",
      "1         0.164272      0.118826           0.12275    ...                0.0   \n",
      "2         0.164272      0.118826           0.12275    ...                0.0   \n",
      "3         0.164272      0.118826           0.12275    ...                0.0   \n",
      "4         0.164272      0.118826           0.12275    ...                0.0   \n",
      "\n",
      "   device_model_rate  device_type_rate  device_conn_type_rate  C14_rate  \\\n",
      "0           0.152539          0.169176               0.181125  0.079519   \n",
      "1           0.152539          0.169176               0.181125  0.079519   \n",
      "2           0.267091          0.169176               0.181125  0.079519   \n",
      "3           0.308108          0.169176               0.181125  0.079519   \n",
      "4           0.108599          0.169176               0.181125  0.079519   \n",
      "\n",
      "   C15_rate  C16_rate  C17_rate  C18_rate  C19_rate  \n",
      "0  0.158608  0.158315  0.083492  0.158225  0.166803  \n",
      "1  0.158608  0.158315  0.083492  0.158225  0.166803  \n",
      "2  0.158608  0.158315  0.083492  0.158225  0.166803  \n",
      "3  0.158608  0.158315  0.083492  0.158225  0.166803  \n",
      "4  0.158608  0.158315  0.083492  0.158225  0.166803  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click     C20  C21  hour_rate   C1_rate  banner_pos_rate  \\\n",
      "0  1.005585e+19      0      -1  157   0.174714  0.169331         0.164272   \n",
      "1  6.837967e+18      0  100212  157   0.173695  0.169331         0.164272   \n",
      "2  1.052369e+19      0  100148  157   0.174714  0.169331         0.164272   \n",
      "3  1.391783e+19      0  100034  157   0.173695  0.169331         0.164272   \n",
      "4  1.503958e+19      0      -1  157   0.150696  0.169331         0.164272   \n",
      "\n",
      "   site_id_rate  site_domain_rate  site_category_rate    ...     \\\n",
      "0      0.118826           0.12275             0.12858    ...      \n",
      "1      0.118826           0.12275             0.12858    ...      \n",
      "2      0.118826           0.12275             0.12858    ...      \n",
      "3      0.118826           0.12275             0.12858    ...      \n",
      "4      0.118826           0.12275             0.12858    ...      \n",
      "\n",
      "   device_ip_rate  device_model_rate  device_type_rate  device_conn_type_rate  \\\n",
      "0             0.0           0.152539          0.169176               0.181125   \n",
      "1             0.0           0.152539          0.169176               0.181125   \n",
      "2             0.0           0.267091          0.169176               0.181125   \n",
      "3             0.0           0.308108          0.169176               0.181125   \n",
      "4             0.0           0.108599          0.169176               0.181125   \n",
      "\n",
      "   C14_rate  C15_rate  C16_rate  C17_rate  C18_rate  C19_rate  \n",
      "0  0.079519  0.158608  0.158315  0.083492  0.158225  0.166803  \n",
      "1  0.079519  0.158608  0.158315  0.083492  0.158225  0.166803  \n",
      "2  0.079519  0.158608  0.158315  0.083492  0.158225  0.166803  \n",
      "3  0.079519  0.158608  0.158315  0.083492  0.158225  0.166803  \n",
      "4  0.079519  0.158608  0.158315  0.083492  0.158225  0.166803  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "      C20  avg(click)\n",
      "0      -1    0.192793\n",
      "1  100000    0.089420\n",
      "2  100001    0.172894\n",
      "3  100002    0.107297\n",
      "4  100003    0.046748\n",
      "             id  click  C20  C21  hour_rate   C1_rate  banner_pos_rate  \\\n",
      "0  1.005585e+19      0   -1  157   0.174714  0.169331         0.164272   \n",
      "1  1.503958e+19      0   -1  157   0.150696  0.169331         0.164272   \n",
      "2  4.861583e+18      0   -1  157   0.174714  0.169331         0.183614   \n",
      "3  6.567762e+18      0   -1  157   0.173695  0.169331         0.164272   \n",
      "4  1.527897e+19      0   -1  157   0.151206  0.169331         0.164272   \n",
      "\n",
      "   site_id_rate  site_domain_rate  site_category_rate     ...      \\\n",
      "0      0.118826          0.122750            0.128580     ...       \n",
      "1      0.118826          0.122750            0.128580     ...       \n",
      "2      0.078899          0.078899            0.179579     ...       \n",
      "3      0.076090          0.076090            0.179579     ...       \n",
      "4      0.118826          0.122750            0.128580     ...       \n",
      "\n",
      "   device_model_rate  device_type_rate  device_conn_type_rate  C14_rate  \\\n",
      "0           0.152539          0.169176               0.181125  0.079519   \n",
      "1           0.108599          0.169176               0.181125  0.079519   \n",
      "2           0.223869          0.169176               0.181125  0.079519   \n",
      "3           0.172533          0.169176               0.181125  0.079519   \n",
      "4           0.224491          0.169176               0.181125  0.079519   \n",
      "\n",
      "   C15_rate  C16_rate  C17_rate  C18_rate  C19_rate  avg(click)  \n",
      "0  0.158608  0.158315  0.083492  0.158225  0.166803    0.192793  \n",
      "1  0.158608  0.158315  0.083492  0.158225  0.166803    0.192793  \n",
      "2  0.158608  0.158315  0.083492  0.158225  0.166803    0.192793  \n",
      "3  0.158608  0.158315  0.083492  0.158225  0.166803    0.192793  \n",
      "4  0.158608  0.158315  0.083492  0.158225  0.166803    0.192793  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click  C20  C21  hour_rate   C1_rate  banner_pos_rate  \\\n",
      "0  1.005585e+19      0   -1  157   0.174714  0.169331         0.164272   \n",
      "1  1.503958e+19      0   -1  157   0.150696  0.169331         0.164272   \n",
      "2  4.861583e+18      0   -1  157   0.174714  0.169331         0.183614   \n",
      "3  6.567762e+18      0   -1  157   0.173695  0.169331         0.164272   \n",
      "4  1.527897e+19      0   -1  157   0.151206  0.169331         0.164272   \n",
      "\n",
      "   site_id_rate  site_domain_rate  site_category_rate    ...     \\\n",
      "0      0.118826          0.122750            0.128580    ...      \n",
      "1      0.118826          0.122750            0.128580    ...      \n",
      "2      0.078899          0.078899            0.179579    ...      \n",
      "3      0.076090          0.076090            0.179579    ...      \n",
      "4      0.118826          0.122750            0.128580    ...      \n",
      "\n",
      "   device_model_rate  device_type_rate  device_conn_type_rate  C14_rate  \\\n",
      "0           0.152539          0.169176               0.181125  0.079519   \n",
      "1           0.108599          0.169176               0.181125  0.079519   \n",
      "2           0.223869          0.169176               0.181125  0.079519   \n",
      "3           0.172533          0.169176               0.181125  0.079519   \n",
      "4           0.224491          0.169176               0.181125  0.079519   \n",
      "\n",
      "   C15_rate  C16_rate  C17_rate  C18_rate  C19_rate  C20_rate  \n",
      "0  0.158608  0.158315  0.083492  0.158225  0.166803  0.192793  \n",
      "1  0.158608  0.158315  0.083492  0.158225  0.166803  0.192793  \n",
      "2  0.158608  0.158315  0.083492  0.158225  0.166803  0.192793  \n",
      "3  0.158608  0.158315  0.083492  0.158225  0.166803  0.192793  \n",
      "4  0.158608  0.158315  0.083492  0.158225  0.166803  0.192793  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click  C21  hour_rate   C1_rate  banner_pos_rate  \\\n",
      "0  1.005585e+19      0  157   0.174714  0.169331         0.164272   \n",
      "1  1.503958e+19      0  157   0.150696  0.169331         0.164272   \n",
      "2  4.861583e+18      0  157   0.174714  0.169331         0.183614   \n",
      "3  6.567762e+18      0  157   0.173695  0.169331         0.164272   \n",
      "4  1.527897e+19      0  157   0.151206  0.169331         0.164272   \n",
      "\n",
      "   site_id_rate  site_domain_rate  site_category_rate  app_id_rate    ...     \\\n",
      "0      0.118826          0.122750            0.128580     0.201129    ...      \n",
      "1      0.118826          0.122750            0.128580     0.201129    ...      \n",
      "2      0.078899          0.078899            0.179579     0.198610    ...      \n",
      "3      0.076090          0.076090            0.179579     0.198610    ...      \n",
      "4      0.118826          0.122750            0.128580     0.077611    ...      \n",
      "\n",
      "   device_model_rate  device_type_rate  device_conn_type_rate  C14_rate  \\\n",
      "0           0.152539          0.169176               0.181125  0.079519   \n",
      "1           0.108599          0.169176               0.181125  0.079519   \n",
      "2           0.223869          0.169176               0.181125  0.079519   \n",
      "3           0.172533          0.169176               0.181125  0.079519   \n",
      "4           0.224491          0.169176               0.181125  0.079519   \n",
      "\n",
      "   C15_rate  C16_rate  C17_rate  C18_rate  C19_rate  C20_rate  \n",
      "0  0.158608  0.158315  0.083492  0.158225  0.166803  0.192793  \n",
      "1  0.158608  0.158315  0.083492  0.158225  0.166803  0.192793  \n",
      "2  0.158608  0.158315  0.083492  0.158225  0.166803  0.192793  \n",
      "3  0.158608  0.158315  0.083492  0.158225  0.166803  0.192793  \n",
      "4  0.158608  0.158315  0.083492  0.158225  0.166803  0.192793  \n",
      "\n",
      "[5 rows x 24 columns]\n",
      "   C21  avg(click)\n",
      "0    1    0.137430\n",
      "1   13    0.212600\n",
      "2   15    0.210885\n",
      "3   16    0.259501\n",
      "4   17    0.090817\n",
      "             id  click  C21  hour_rate   C1_rate  banner_pos_rate  \\\n",
      "0  1.005585e+19      0  157   0.174714  0.169331         0.164272   \n",
      "1  1.503958e+19      0  157   0.150696  0.169331         0.164272   \n",
      "2  4.861583e+18      0  157   0.174714  0.169331         0.183614   \n",
      "3  6.567762e+18      0  157   0.173695  0.169331         0.164272   \n",
      "4  1.527897e+19      0  157   0.151206  0.169331         0.164272   \n",
      "\n",
      "   site_id_rate  site_domain_rate  site_category_rate  app_id_rate  \\\n",
      "0      0.118826          0.122750            0.128580     0.201129   \n",
      "1      0.118826          0.122750            0.128580     0.201129   \n",
      "2      0.078899          0.078899            0.179579     0.198610   \n",
      "3      0.076090          0.076090            0.179579     0.198610   \n",
      "4      0.118826          0.122750            0.128580     0.077611   \n",
      "\n",
      "      ...      device_type_rate  device_conn_type_rate  C14_rate  C15_rate  \\\n",
      "0     ...              0.169176               0.181125  0.079519  0.158608   \n",
      "1     ...              0.169176               0.181125  0.079519  0.158608   \n",
      "2     ...              0.169176               0.181125  0.079519  0.158608   \n",
      "3     ...              0.169176               0.181125  0.079519  0.158608   \n",
      "4     ...              0.169176               0.181125  0.079519  0.158608   \n",
      "\n",
      "   C16_rate  C17_rate  C18_rate  C19_rate  C20_rate  avg(click)  \n",
      "0  0.158315  0.083492  0.158225  0.166803  0.192793    0.115366  \n",
      "1  0.158315  0.083492  0.158225  0.166803  0.192793    0.115366  \n",
      "2  0.158315  0.083492  0.158225  0.166803  0.192793    0.115366  \n",
      "3  0.158315  0.083492  0.158225  0.166803  0.192793    0.115366  \n",
      "4  0.158315  0.083492  0.158225  0.166803  0.192793    0.115366  \n",
      "\n",
      "[5 rows x 25 columns]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "             id  click  C21  hour_rate   C1_rate  banner_pos_rate  \\\n",
      "0  1.005585e+19      0  157   0.174714  0.169331         0.164272   \n",
      "1  1.503958e+19      0  157   0.150696  0.169331         0.164272   \n",
      "2  4.861583e+18      0  157   0.174714  0.169331         0.183614   \n",
      "3  6.567762e+18      0  157   0.173695  0.169331         0.164272   \n",
      "4  1.527897e+19      0  157   0.151206  0.169331         0.164272   \n",
      "\n",
      "   site_id_rate  site_domain_rate  site_category_rate  app_id_rate    ...     \\\n",
      "0      0.118826          0.122750            0.128580     0.201129    ...      \n",
      "1      0.118826          0.122750            0.128580     0.201129    ...      \n",
      "2      0.078899          0.078899            0.179579     0.198610    ...      \n",
      "3      0.076090          0.076090            0.179579     0.198610    ...      \n",
      "4      0.118826          0.122750            0.128580     0.077611    ...      \n",
      "\n",
      "   device_type_rate  device_conn_type_rate  C14_rate  C15_rate  C16_rate  \\\n",
      "0          0.169176               0.181125  0.079519  0.158608  0.158315   \n",
      "1          0.169176               0.181125  0.079519  0.158608  0.158315   \n",
      "2          0.169176               0.181125  0.079519  0.158608  0.158315   \n",
      "3          0.169176               0.181125  0.079519  0.158608  0.158315   \n",
      "4          0.169176               0.181125  0.079519  0.158608  0.158315   \n",
      "\n",
      "   C17_rate  C18_rate  C19_rate  C20_rate  C21_rate  \n",
      "0  0.083492  0.158225  0.166803  0.192793  0.115366  \n",
      "1  0.083492  0.158225  0.166803  0.192793  0.115366  \n",
      "2  0.083492  0.158225  0.166803  0.192793  0.115366  \n",
      "3  0.083492  0.158225  0.166803  0.192793  0.115366  \n",
      "4  0.083492  0.158225  0.166803  0.192793  0.115366  \n",
      "\n",
      "[5 rows x 25 columns]\n",
      "             id  click  hour_rate   C1_rate  banner_pos_rate  site_id_rate  \\\n",
      "0  1.005585e+19      0   0.174714  0.169331         0.164272      0.118826   \n",
      "1  1.503958e+19      0   0.150696  0.169331         0.164272      0.118826   \n",
      "2  4.861583e+18      0   0.174714  0.169331         0.183614      0.078899   \n",
      "3  6.567762e+18      0   0.173695  0.169331         0.164272      0.076090   \n",
      "4  1.527897e+19      0   0.151206  0.169331         0.164272      0.118826   \n",
      "\n",
      "   site_domain_rate  site_category_rate  app_id_rate  app_domain_rate  \\\n",
      "0          0.122750            0.128580     0.201129         0.201129   \n",
      "1          0.122750            0.128580     0.201129         0.201129   \n",
      "2          0.078899            0.179579     0.198610         0.194877   \n",
      "3          0.076090            0.179579     0.198610         0.194877   \n",
      "4          0.122750            0.128580     0.077611         0.194877   \n",
      "\n",
      "     ...     device_type_rate  device_conn_type_rate  C14_rate  C15_rate  \\\n",
      "0    ...             0.169176               0.181125  0.079519  0.158608   \n",
      "1    ...             0.169176               0.181125  0.079519  0.158608   \n",
      "2    ...             0.169176               0.181125  0.079519  0.158608   \n",
      "3    ...             0.169176               0.181125  0.079519  0.158608   \n",
      "4    ...             0.169176               0.181125  0.079519  0.158608   \n",
      "\n",
      "   C16_rate  C17_rate  C18_rate  C19_rate  C20_rate  C21_rate  \n",
      "0  0.158315  0.083492  0.158225  0.166803  0.192793  0.115366  \n",
      "1  0.158315  0.083492  0.158225  0.166803  0.192793  0.115366  \n",
      "2  0.158315  0.083492  0.158225  0.166803  0.192793  0.115366  \n",
      "3  0.158315  0.083492  0.158225  0.166803  0.192793  0.115366  \n",
      "4  0.158315  0.083492  0.158225  0.166803  0.192793  0.115366  \n",
      "\n",
      "[5 rows x 24 columns]\n"
     ]
    }
   ],
   "source": [
    "train_rate = pd.read_csv('data/train_1.csv') \n",
    "feature_list = ['hour', 'C1', 'banner_pos', 'site_id', 'site_domain',\n",
    "       'site_category', 'app_id', 'app_domain', 'app_category', 'device_id',\n",
    "       'device_ip', 'device_model', 'device_type', 'device_conn_type', 'C14',\n",
    "       'C15', 'C16', 'C17', 'C18', 'C19', 'C20', 'C21']\n",
    "for column in feature_list:\n",
    "    rate = pd.read_csv(path+'clickVS'+column+'.csv', usecols=[column,'avg(click)'])\n",
    "    train_rate = pd.merge(train_rate,rate)\n",
    "    train_rate.rename(columns={'avg(click)':column+'_rate'}, inplace = True)\n",
    "    train_rate.drop([column],inplace = True, axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_rate.to_csv('data/train_rate.csv',index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模型训练 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LogisticRegression\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "lr= LogisticRegression()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "train = pd.read_csv('data/train_rate.csv')\n",
    "y_train = train['click']   #形式为Class_x\n",
    "train = train.drop(['click'], axis=1)\n",
    "X_train = np.array(train)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "             id  hour_rate   C1_rate  banner_pos_rate  site_id_rate  \\\n",
      "0  1.005585e+19   0.174714  0.169331         0.164272      0.118826   \n",
      "1  1.503958e+19   0.150696  0.169331         0.164272      0.118826   \n",
      "2  4.861583e+18   0.174714  0.169331         0.183614      0.078899   \n",
      "3  6.567762e+18   0.173695  0.169331         0.164272      0.076090   \n",
      "4  1.527897e+19   0.151206  0.169331         0.164272      0.118826   \n",
      "\n",
      "   site_domain_rate  site_category_rate  app_id_rate  app_domain_rate  \\\n",
      "0          0.122750            0.128580     0.201129         0.201129   \n",
      "1          0.122750            0.128580     0.201129         0.201129   \n",
      "2          0.078899            0.179579     0.198610         0.194877   \n",
      "3          0.076090            0.179579     0.198610         0.194877   \n",
      "4          0.122750            0.128580     0.077611         0.194877   \n",
      "\n",
      "   app_category_rate    ...     device_type_rate  device_conn_type_rate  \\\n",
      "0           0.247588    ...             0.169176               0.181125   \n",
      "1           0.247588    ...             0.169176               0.181125   \n",
      "2           0.199148    ...             0.169176               0.181125   \n",
      "3           0.199148    ...             0.169176               0.181125   \n",
      "4           0.108118    ...             0.169176               0.181125   \n",
      "\n",
      "   C14_rate  C15_rate  C16_rate  C17_rate  C18_rate  C19_rate  C20_rate  \\\n",
      "0  0.079519  0.158608  0.158315  0.083492  0.158225  0.166803  0.192793   \n",
      "1  0.079519  0.158608  0.158315  0.083492  0.158225  0.166803  0.192793   \n",
      "2  0.079519  0.158608  0.158315  0.083492  0.158225  0.166803  0.192793   \n",
      "3  0.079519  0.158608  0.158315  0.083492  0.158225  0.166803  0.192793   \n",
      "4  0.079519  0.158608  0.158315  0.083492  0.158225  0.166803  0.192793   \n",
      "\n",
      "   C21_rate  \n",
      "0  0.115366  \n",
      "1  0.115366  \n",
      "2  0.115366  \n",
      "3  0.115366  \n",
      "4  0.115366  \n",
      "\n",
      "[5 rows x 23 columns]\n"
     ]
    }
   ],
   "source": [
    "print(train.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.\n",
      "  \"This module will be removed in 0.20.\", DeprecationWarning)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "logloss of each fold is:  [0.51172749 0.50821552 0.51138838 0.5117736  0.50705172]\n",
      "cv logloss is: 0.5100313410040963\n"
     ]
    }
   ],
   "source": [
    "from sklearn.cross_validation import cross_val_score\n",
    "loss = cross_val_score(lr, X_train, y_train, cv=5, scoring='neg_log_loss')\n",
    "print('logloss of each fold is: ',-loss)\n",
    "print('cv logloss is:', -loss.mean())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "### 正则化的 Logistic Regression及参数调优"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "GridSearchCV(cv=5, error_score='raise',\n",
       "       estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
       "          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n",
       "          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n",
       "          verbose=0, warm_start=False),\n",
       "       fit_params=None, iid=True, n_jobs=1,\n",
       "       param_grid={'penalty': ['l1', 'l2'], 'C': [0.001, 0.01, 0.1]},\n",
       "       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',\n",
       "       scoring='neg_log_loss', verbose=0)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.model_selection import GridSearchCV\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "\n",
    "#需要调优的参数\n",
    "# 请尝试将L1正则和L2正则分开，并配合合适的优化求解算法（slover）\n",
    "#tuned_parameters = {'penalty':['l1','l2'],\n",
    "#                   'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000]\n",
    "#                   }\n",
    "penaltys = ['l1','l2']\n",
    "Cs = [0.001, 0.01, 0.1]\n",
    "tuned_parameters = dict(penalty = penaltys, C = Cs)\n",
    "\n",
    "lr_penalty= LogisticRegression()\n",
    "grid= GridSearchCV(lr_penalty, tuned_parameters,cv=5, scoring='neg_log_loss')\n",
    "grid.fit(X_train,y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 6：55"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/utils/deprecation.py:122: FutureWarning: You are accessing a training score ('mean_train_score'), which will not be available by default any more in 0.21. If you need training scores, please set return_train_score=True\n",
      "  warnings.warn(*warn_args, **warn_kwargs)\n",
      "/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/utils/deprecation.py:122: FutureWarning: You are accessing a training score ('split0_train_score'), which will not be available by default any more in 0.21. If you need training scores, please set return_train_score=True\n",
      "  warnings.warn(*warn_args, **warn_kwargs)\n",
      "/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/utils/deprecation.py:122: FutureWarning: You are accessing a training score ('split1_train_score'), which will not be available by default any more in 0.21. If you need training scores, please set return_train_score=True\n",
      "  warnings.warn(*warn_args, **warn_kwargs)\n",
      "/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/utils/deprecation.py:122: FutureWarning: You are accessing a training score ('split2_train_score'), which will not be available by default any more in 0.21. If you need training scores, please set return_train_score=True\n",
      "  warnings.warn(*warn_args, **warn_kwargs)\n",
      "/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/utils/deprecation.py:122: FutureWarning: You are accessing a training score ('split3_train_score'), which will not be available by default any more in 0.21. If you need training scores, please set return_train_score=True\n",
      "  warnings.warn(*warn_args, **warn_kwargs)\n",
      "/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/utils/deprecation.py:122: FutureWarning: You are accessing a training score ('split4_train_score'), which will not be available by default any more in 0.21. If you need training scores, please set return_train_score=True\n",
      "  warnings.warn(*warn_args, **warn_kwargs)\n",
      "/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/utils/deprecation.py:122: FutureWarning: You are accessing a training score ('std_train_score'), which will not be available by default any more in 0.21. If you need training scores, please set return_train_score=True\n",
      "  warnings.warn(*warn_args, **warn_kwargs)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'mean_fit_time': array([2.72261087e+02, 1.56537533e-01, 2.56909485e+02, 1.50077152e-01,\n",
       "        1.29515524e+02, 1.51040411e-01]),\n",
       " 'mean_score_time': array([0.00545583, 0.00762744, 0.00547543, 0.00697789, 0.00585942,\n",
       "        0.00714774]),\n",
       " 'mean_test_score': array([-0.35312097, -0.51003139, -0.31188179, -0.51003139, -0.32254632,\n",
       "        -0.51003139]),\n",
       " 'mean_train_score': array([-0.35257568, -0.51002551, -0.30160881, -0.51002551, -0.29662265,\n",
       "        -0.51002551]),\n",
       " 'param_C': masked_array(data=[0.001, 0.001, 0.01, 0.01, 0.1, 0.1],\n",
       "              mask=[False, False, False, False, False, False],\n",
       "        fill_value='?',\n",
       "             dtype=object),\n",
       " 'param_penalty': masked_array(data=['l1', 'l2', 'l1', 'l2', 'l1', 'l2'],\n",
       "              mask=[False, False, False, False, False, False],\n",
       "        fill_value='?',\n",
       "             dtype=object),\n",
       " 'params': [{'C': 0.001, 'penalty': 'l1'},\n",
       "  {'C': 0.001, 'penalty': 'l2'},\n",
       "  {'C': 0.01, 'penalty': 'l1'},\n",
       "  {'C': 0.01, 'penalty': 'l2'},\n",
       "  {'C': 0.1, 'penalty': 'l1'},\n",
       "  {'C': 0.1, 'penalty': 'l2'}],\n",
       " 'rank_test_score': array([3, 5, 1, 4, 2, 5], dtype=int32),\n",
       " 'split0_test_score': array([-0.36632289, -0.51172749, -0.33142827, -0.51172749, -0.3292452 ,\n",
       "        -0.51172749]),\n",
       " 'split0_train_score': array([-0.34774507, -0.50960117, -0.29526949, -0.50960117, -0.29168284,\n",
       "        -0.50960117]),\n",
       " 'split1_test_score': array([-0.37405779, -0.50821552, -0.33637127, -0.50821552, -0.33495795,\n",
       "        -0.50821552]),\n",
       " 'split1_train_score': array([-0.34462808, -0.51048025, -0.29412823, -0.51048025, -0.29023123,\n",
       "        -0.51048025]),\n",
       " 'split2_test_score': array([-0.34024933, -0.51138838, -0.28245545, -0.51138838, -0.27892914,\n",
       "        -0.51138838]),\n",
       " 'split2_train_score': array([-0.35730774, -0.50968638, -0.30825382, -0.50968638, -0.30423489,\n",
       "        -0.50968638]),\n",
       " 'split3_test_score': array([-0.33954891, -0.5117736 , -0.32203757, -0.5117736 , -0.36469799,\n",
       "        -0.5117736 ]),\n",
       " 'split3_train_score': array([-0.35757   , -0.50958944, -0.297268  , -0.50958944, -0.29151061,\n",
       "        -0.50958944]),\n",
       " 'split4_test_score': array([-0.34542488, -0.50705172, -0.28711419, -0.50705172, -0.30490011,\n",
       "        -0.50705172]),\n",
       " 'split4_train_score': array([-0.35562752, -0.51077032, -0.31312451, -0.51077032, -0.30545366,\n",
       "        -0.51077032]),\n",
       " 'std_fit_time': array([6.84766288e-01, 1.12061935e-03, 2.46126758e+01, 5.92886703e-03,\n",
       "        1.04019599e+02, 1.60593804e-03]),\n",
       " 'std_score_time': array([0.00042976, 0.00037862, 0.00044438, 0.00020491, 0.00057962,\n",
       "        0.00040398]),\n",
       " 'std_test_score': array([0.01429515, 0.00199645, 0.02264629, 0.00199645, 0.02894657,\n",
       "        0.00199645]),\n",
       " 'std_train_score': array([0.00535067, 0.00049935, 0.0076388 , 0.00049935, 0.00674267,\n",
       "        0.00049935])}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# view the complete results (list of named tuples)\n",
    "grid.cv_results_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.31188178822656837\n",
      "{'C': 0.01, 'penalty': 'l1'}\n"
     ]
    }
   ],
   "source": [
    "# examine the best model\n",
    "print(-grid.best_score_)\n",
    "print(grid.best_params_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/utils/deprecation.py:122: FutureWarning: You are accessing a training score ('mean_train_score'), which will not be available by default any more in 0.21. If you need training scores, please set return_train_score=True\n",
      "  warnings.warn(*warn_args, **warn_kwargs)\n",
      "/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/utils/deprecation.py:122: FutureWarning: You are accessing a training score ('std_train_score'), which will not be available by default any more in 0.21. If you need training scores, please set return_train_score=True\n",
      "  warnings.warn(*warn_args, **warn_kwargs)\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZEAAAEHCAYAAABvHnsJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xt0lfWd7/H3NxeScAtJuIOQIALeWqhJpDZaoSN4wUrp2NpOUUDLHKeOXae166B2WqfTGWeOPeOpq05XPVOl43TaLqbVKq1IBWlLi9xqQVHBC0HCTUiAEEjI7Xv+eDZJCDvJ3g/7EsLntVZWnv3k92R/s318Pvyey+9n7o6IiEgYGekuQEREzl0KERERCU0hIiIioSlEREQkNIWIiIiEphAREZHQFCIiIhKaQkREREJTiIiISGhZ6S4g2YYOHerFxcXpLkNE5JyyefPmQ+4+rKd2fT5EiouL2bRpU7rLEBE5p5jZrlja6XSWiIiEphAREZHQFCIiIhKaQkREREJTiIiISGgKERERCU0hIiIioSlEREQkNIWISJp89gfr+OwP1qW7DOmrnrop+EoyhYiIiISmEBERkdAUIiIiElqfH4BRpLf6RvXXIktr01qH9E3b9h0F4NIkv49CRESkt2tthZaT0Bz56rjc9roBmhvbfjakpYaTlpv00hQiIiLRuENrc+Qg3Rg5SHd3EI/9AB/7dpGv1qa4yx8DHMoYmvjPpZO0hIiZ5QL/DVwAbAVud3eP0i4L+AkwGtju7oti3VZEzlHuHQ6ikYN3dwfxsD874z2i/MxbE/AHGWTlQla/4HtmToflyPd+/SGrMPI6J/jKzOm0Xb/21zH8bMcPPk8LmSQ7RtLVE/kCUOXuc8xsOXAdsDJKu7nAFne/1cxeMLOpQGmM24pIPFpb2g+4bQff7g7iYX/W+QDfqW1LY2L+noysLg6wp75yIXtIlINxdwfxEAf4jCwwS8zfFIcm65eS90lXiMwEfh5ZXg3MIHoQrAB+HemRDAFqY9nWzBYDiwHGjRuX6NpFEse900H0ZKeDb5SDcaifdfzXdhc/85bE/E2xHGBz8888GJ92gM8584Af10E8BzIyE/P3SLfSFSJFwNHIci0wOVojd68DMLP1wD53f8/MetzW3Z8AngAoLS3VqS7pPeoOwtsrYccKpjRuI5NW+HaP01j3zDK7OPh2eJ07uPsDddifdTzAZ/ZLy7+6JX3SFSKHgPzIcn7k9RkigVEHXAWsNrMZsW4r0iu4w/7XYMeLsGMF7NkMOAwaTW1GPk1kM/zaxR3OlYc8wGfqHhlJj3TteauAWQSnpWYCj3bR7qvAG+7+n2Z2AsiLY1uR9Gg8ATt/F4TG2yuhdg9gMOYKmPEgTJoNIy9n78NXAzD8mq91//tEQvhW0SMA/CzJ75OuEPkxMM/MtgJbgFVmVgJ8yd3v69DuceBpM/sS8C7wIkHNp22b2tJFoji6B95+EbavgJ2/Da4x9BsIF84IguOi62Dg8HRXKZJwaQkRdz8JzOm0eidwX6d2ewh6Gx21RNlWJLVaW2Hvn4Lexo4VwSkrgCHj4YoFQW9j/MeC000ifZhOpIrEqqEW3ns5uL7x9ko4fjC4oD1uOlz3LZh0PQydpAvLcl5RiIh0p+a99ovilX8InhzOHRKcnpp0PVw4E/oXprtKkbRRiIh01NIMu1+JnKZ6EQ7tCNYPnQzT74bJN8DYct0NJRKh/xNETtTAO6uC4HjnN9BwFDKyobgCSu+ESbOgcELC3zZVd8+IJJNCRM4/7nBwe3tvY/crwRhJA4bBlJuDi+IXzoCcQQl+W2d3TT0bKmvYsLOa1/YcxYC/+vdXGJybHXzlZUW+ZzMot3254/oB/TIxXXeRXkIhIueH5pNQubb9+saRXcH6kR+Cq+8Lrm+MngYZiZunrbXVefuDukho1LBxZw37axsAyM/LJjszAwMamlr5oLaO2oYmauubqW/qfviRDCMIlo6hkxsJnc7r87IZfGp9ZHlAvywyMhRCkhgKEem7jh1oG2KEd1+GpuPBk+ETroWK/wkXzYL8MQl7u+aWVrbtrWXDzho2VNawsbKGIyeCIbyHD8qhvKSQK0sKKSspZNLwQXzu/70CwM/++qOn/Z6mllaONTRTW9/UFizB92ivmznW0MTOQ8fb1h1v7DmEBuZ0FThdBFCHkBqUoxCSdgoR6TvcYf/WoLex/YXgOQ6AwWPgw58NehvFVwfDbidAQ1MLf959JOhlVNaweddhTkQO4OOL+nPdxSMoiwTHuML+MZ+Cys7MoHBAPwoHhBuFtflUCEUJoPZwOj2k3q850ba+7mRzt7/fToVQlKA5FUDRekX5kdcDc7PIVAj1GQoRObc1ngieED91fePYPsBgbCnM/HoQHCMuS8izG7UNTWzedZiNO4PTU1urjtLYEsw3MWXkIP7yirGUFRdSXlLIiMHJn1GuK1mZGRQM6EfBWYRQ3cnm9gCK0vvp3CvaXXOiLbiONXQfQgCDIj2h9rDpPpQ6vh6Yk0VWZuJOO8rZUYjIuefI7mCIkR0vBmNUNTdAv0EwcWYQGhOvg4FnPzLuobqTQWBErmm8ua+WVoesDOOyMfks+Fgx5cWFlBYXMKR/auZuSIWszAyG9O8X+m9qafVICHUfQB1P2e090sBbDceorW/i2MlmeppmbkC/zB5Pv3V1k8Kg3CyyFUIJoxCR3q+1JRj99lRv48DrwfqCEihdFNxNNe6qYC6Js1B1+ETbqan1O2t47+BxAHKyMvjIuALumXkRV5YUMm3cEPr30/86XcnMMPLzgtNXYbS2OnWNkYCpbz4jcKJdI9pf28COD45RWx9cI2rtIYT698uM6fpPtDaDcrPpl6UQOkX/J0jv1HAU3l3dPsTIierIECMfhev+ITLEyEWhT1O5O+8erGPDzsNs2FnNxsrD7DlSD8Cg3CzKigu59YoLKC8p5PIx+TpopFBGhrXdcUZB/Nu7O8cbW04Pnbbl6KfjDh47ybsH69p+3tJDCuVlZ3bR0+n5JoVBuVnkZPWdCbMUItJ7VL/bfgvurj9AazPkFQSnpybNhomfCF6H0NzSypv7jrU9o7Gp8jDVx4NpWIcOzKG8pIAvXl1CWUkhU0YO1oXfc5iZMTAnuHYymry4t3d3TjS2dH1nXOSUW8d1NccbqTx0vC2gmnsIoZysjCin32I5HRd8z83uPSGkEJH0aWmC9zsMMVL9drB+2MXw0XuC3sbYslBDjJxsbmFr1dHgdtudwZ1Tp+46GluQx8cnDwtuty0upGToAD28J23MjAE5WQzIyWJUfs/tO3N36ptaOpyK6/nGhCMnGjvcIddEU0v3IdQvK6Pbns7g3GwO1DYwICf5h3iFiKTWiRp4+zeRIUZWwcmjwZSqxVdD+eJgiJGC4rh/bd3JZv6063DbMxp/3n2ExubgzqmLhg/klqmjKY+Exugh8f/rVCRWZkb/fln075fFyPz479Jzd042t7YFytEeAujU9aKqwyfaekqn7hocHeL946UQkeRyhw/ebO9tVG2IDDEyHC65GSbdEDz8lzMwrl9bc7yRjaeeBK+sYdveWlpancwM49LRg7l9+njKIqER9nkLkXQwM3KzM8nNzmR4yFvFG5pa+NwTr5CKs7IKEUm8pgbY1XGIkfeD9aM+DNd8Lbi+MSq+IUb2Ha1vOzW1YWcNb39QBwTd+qkXDOFvrr2QsuJCPjK+gIEp6MKL9Ga52ZkpuxlE/7dJYhzbHxli5MUOQ4zkBQMZXv3VYIiRwaNj+lXuzs5Dx9tOTW3YWUPV4eDOqYE5WVwxvoC508ZQXlLIh8bm96k7XUTONQoRCae1FfZvae9t7H01WD94LEz9XGSIkQrI7vn6Q0ur89b+2g4P9h3mUN1JAAoH9KOsuICFHyvhypJCpowcpKeVRXoRhYjErvE4vLcmcn1jJdTtJxhipAw+8Y0gOIZf0uOzG43Nrby250jbMxqbdh1uGypjdH4uFROLKC8porykgAuHDdSdUyK9mEJEunfk/fbexs7fQ8tJyBkcTAs76fpgmtgBQ7v9FScam3n1/SOs3xk8o/Hn3UdoaAruHrlw2ADmfGhU251TYwsSMziiiKSGQkRO19oCVZva76b6YFuwvnAClN0VGWLko90OMXLkRCObKg+3Xc94fc9RmludDIOLRw3mc+XjuLKkkNLiQoYOzEnRHyYiyaAQkWCIkXdWtQ8xUl8DGVlBWMz6x8gQIxO73PxAbUPbXVMbK2t4a/8xAPplZvChsfksvmYCZSWFXDG+IBjKQkT6DIXI+erQO5Hexgp4f11kiJHC4C6qSbOD01V5Q87YzN15v+YE6yMz9W2orGFX9QkgGNTuivEF3HT5KMpKCpl6wZBeNTyDiCSeQuR80dIEu/7Yfn2j5t1g/fBL4Kq/bR9iJOP0g35rq7Pjg2OnPaPxwbHgzqkh/bMpKy7kC1eOp7ykkEtGD9YQ2yLnGYVIX3a8Gt7pOMRIbTDESMk1MP3uoNdRMP60TZpaWnl9z9EOT4Mf5mh9MMXryMG5TJ9Q1DZb38RhAzVN6lnoPC2uyLlIIdKXuMMHb7RfFN+9AXAYOAIunRv0Nko+ftoQI/WNLby6+zAbdx5mQ2U1f9p1hPqmYIrXkqEDmH3piOB22+JCLijM0+22InIahci5rqkBKn/fHhxHdwfrR02Fa5cE1zdGfrhtiJGj9U1sfutA2zMar+05SlOLYwZTRg7mM6VjKS8poqy4IPS4PSJy/lCInItq97VPD/veGmg6Adn9YcKMYGyqi2bB4FEAHDx2ko3bDrRdz3hzfy0emeL18rH5LKoIngS/Ylwh+f1155SIxEch0oXP/mAd0EvOW7e2wr4/Ry6KvwD7tgTr8y+AqX/VNsSIZ+VQdbieDW/XsGHnVjZW1vDeoWCK19zsYIrXL3/iIsqLC5k2roC8frpzSkTOjkKktzpZ1z7EyNsroe4AWAaMLYdPfBMmXY8Pm8I7B48Ht9tufpMNO2vYd7QBgMGRKV4/UxZM8XrZaE3xKiKJpxDpTQ7var8Ft/L30NIIOfnBtLCTrqd5wkzeOJodnJp6sYaNlS9x+ERw59SwQTmUlxRSXlxIeUkhk0cM0p1TIpJ0CpF0ammGqo3tF8UPvhmsL5oI5YtpnHAdf7YpbHj/GOs31fCnn2/meGNw59S4wv584uIRbaExvqi/7pwSkZRTiKRa/RF4t+MQI4eDIUbGX0XD5Z9nS96V/K4mnw07a9jyu6M0tmwGYPKIQXzqI2PabrcNM+2miEiiKUSSzR2q32nvbez6I3gL9C+ioeQ63hx8FSsbLuX3u0/yxgu1tPohMjOquWxMPndcNZ7ykiJKxxdQoCleRaQXUogkQ3MjvN9xiJH3AGgaegnvTbyT1a0f4ZkPRrDjT8FsfTlZ1Uy9YAj3zJhIeUkR08YNYYCmeBWRc0BajlRmlgv8N3ABsBW43d09Srss4CfAaGC7uy8yszLgGaAy0uxOd9+e6Bq/Uf21yNLa2DaoO9hhiJHV0HiM1swc9heW8ccRn+S/jlzMn6oGQRUMysniiuKBzL1iHOXFhVyuKV5F5ByVrn/ufgGocvc5ZrYcuA5YGaXdXGCLu99qZi+Y2VSgAPi+u/9jCus9kzsceL3tNJVXbcJwTvQbxubca3im+TJeOD6Z+uO5FA3oR3lJId+8Jph46eJRg8nUnVMi0gekK0RmAj+PLK8GZhA9RFYAv470SIYAtcBk4NNmdguwG/jLaL2YpGiqD2b327EC37ECq90DwHv9JvOC38qvGz/MtoZixmT058rJhXyjJLhzasLQAbpzSkT6pHSFSBFwNLJ8KhjO4O51AGa2Htjn7u+ZWQHwd+7+KzP7I/BxYE3H7cxsMbAYYNy4cWdXae1e2PEizW+twHauIbOlgQbL5Xctl/OblptY0zKV/MFjKb+4kC8WF1JWUsiYIXln954iIueIdIXIISA/spwfeX0GMysC6oCrgNVmNoPgGsrrkSaVwPDO27n7E8ATAKWlpaF6KRktJxnWcgD+9WIA9vkwVrVcw+rWj3Bs5HSmlYzgEyWFLCkuoEhTvIpIL5OqIZvSFSKrgFkEp7RmAo920e6rwBvu/p9mdgLIA74C7DCzp4HLgG8no8DXW8YzvtV4ym9k3/CPM+qiqZRPGMrj44YwSFO8iogA6QuRHwPzzGwrsAVYZWYlwJfc/b4O7R4HnjazLwHvAi8CrxLcsXUP8Iy7v5GMAsdmHWVn64X87f2Pa4pXEZEupCVE3P0kMKfT6p3AfZ3a7SHoqXS0D7g2acVFDM6o57KM3QoQEZFuaFhXEREJTSEiIiKhaWyNLnyr6BEAfpbmOkREejP1REREJDSFiIiIhKYQERGR0BQiIiISmkJERERCU4iIiEhoChEREQlNz4l0IVUjYIqInMvUExERkdDUExGRPqepqYmqqioaGhrSXUqvl5uby9ixY8nODjfFhUJERPqcqqoqBg0aRHFxsaam7oa7U11dTVVVFSUlJaF+h05niUif09DQQFFRkQKkB2ZGUVHRWfXYYgoRM8sws8FmlmlmM8xsUOh3FBFJgXgD5LM/WMdnf7AuSdX0XmcbtLH2RH4GXAN8B7gTePas3lVERPqEWENklLsvBya4+xeAgUmsSUSkT7j22mtPe93U1MTNN9/c7TaPPfYYFRUV5OXlUVFRwS9+8Yu43vPZZ5/lyJEj8ZYaWqwX1mvM7FngNTObA6SuQhGRPqC+vp4rr7ySHTt2dNvu3nvv5d5772XixImsXbs27vd59tlnmTp1KkOGDAlbalxiDZFbgUvd/U9mNhX4TBJrEhFJmL9/fhtv7K3tsd0b+4I2sVwXuWT0YL5586Vx1ZGXl8fWrVuZOHFiXNsBHDhwgDvuuIPDhw8zd+5c7r//frZv386iRYtobGxk7ty5PPjgg8yaNYstW7awbds2KioqePTRR+N+r3jFejqrCXjHzDKBAqA1eSWJiEhHDz/8MLfddhvr16/nl7/8JdXV1Sxfvpx58+axceNGxo0bB8DKlSu54YYbWLZsWUoCBGLvifwM+BHwCWAY8PXIsohIrxZrj+FUD6Q3Dnm0fft21q1bx9KlS6mrq2Pv3r3Mnz+fJUuWMGfOHG666aa01RZriIxy9+Vm9kV3v8XM1ie1KhERaTN58mRuueUWZsyYwdKlSykoKGD16tUsWbKEiRMnMmHCBO666y6ys7PJy8vj+PHjKast1tNZurAuIpImS5Ys4ZFHHmH69Om89NJLjBw5kokTJzJ//nzKysq4/vrr24Ytuf3227nzzjspKyujvr4+6bWZu/fcyCyH0y+s73T3o0mvLgFKS0t906ZN6S5DRFLozTff5OKLL45rm958OivZon1eZrbZ3Ut72jbW01ktwBVmNh/YBrwed5UiIr3Y+RgeiRDr6aylwBhgReT70iTVIyIi55BYeyLjI0+qA7xoZvE/ASMiIn1OrCGy28weBNYB04H3k1eSiEgaPBW5TXbhr9Jbxzkm1tNZCwjuyPp05PuCJNUjIiLnkJhCxN0b3f1xd/+Su/+buzcmuzARkXNdOgZgfPXVV3nyySfjLTU0zWwoIpICqRqAcdq0aUybNi1smXHrNkTM7GWg84MkBri7z0xaVSIiifLCEtj/Ws/t9m8Nvj8VwxAiIy+HG/45rjLOZgDGyspKHnzwQfLy8mhtbeXJJ59k27ZtLFiwgIyMDBYsWMDdd98NwJo1a1izZg0PPfQQEPSGbrrpJpYtW8bIkSN57rnn4n7/7nQbIu4+I6HvJiIioTz//POsXLmS6dOnA7Bnzx5++MMfMmrUKG688ca2EIkmNzeXDRs2MGPGDPbu3cvo0aMTVpdOZ4lI3xZrj6GX3501a9astgAByMzM5IEHHmDo0KE0Nzd3u+3ChQsBGD9+PI2Nib2kHVOImNn97v5wh9eXAEPd/XcJrUZERKIaOPD0CWUfeughfvrTn5KZmcmsWbPi2jaRYr3F92IzW2dmt0VePwR8NeybmlmumS03sy1m9rT1MFO8mX3FzF6KLA81s9+b2WtmFt9JSRGRPmLevHnMnj2bxYsX09zcTENDQ1rqiHUAxg3AVcAad68wszVAk7tfF+pNze4CSt39f5jZcuAxd1/ZRdvxwHPAQXf/CzP7NlAH/G/gVeBWd+/ydgcNwChy/gkzAGNvP52VTGczAGOsPZFq4HEg18xuASYBOfEW2sFM4DeR5dVAdxfwvwvc33lbd28FftvDtiIisVn4q/MyQM5WrBfWPwVMAfYA1wOzCIY/CasIODWUfC0wOVojM/s8sAV4o5ttC6NstxhYDLRNGykiIokXa4g0A+XAxQRDwb/l7mczHPwhID+ynB95Hc0cYBwwG5hsZvdE2XZX543c/QngCQhOZ51FnSIi0o14hoIfTeKGgl9F0JuB4PTUy9Eaufvn3b0CuA3Y7O7fO7WtmWUAH+9qWxGReCxcsZCFKxamu4xzTqwhMt7dH3L3F93974His3zfHwNjzGwrUAOsMrMSM/tODNs+BtwIbAV+5e7vnGUtIiISUqwhstvMHjSzmWb2AGc5FLy7n3T3Oe7+IXef74Gd7n5fF+0r3f0vIsuH3P1qd7/M3e+P1l5EpDfoPADjHXfcwfTp0/nkJz/Z5QOCZzsA47PPPsuRI0fClhw3DQUvIpICa9eupbm5mVdeeYXa2lpWroz6VAP33nsva9euZcyYMaxdu5Z58+bF9T6pDpGYLqxHhn5/PMm1iIgk3L9s+Bfeqnmrx3an2sRyXWRK4RT+V/n/iquOESNG8OUvfxmA1tbWuLY9cOAAd9xxB4cPH2bu3Lncf//9bN++nUWLFtHY2MjcuXN58MEHmTVrFlu2bGHbtm1UVFTw6KOPxvU+YcTaExERkbNw0UUXUV5ezjPPPENGRkaPQ5V09PDDD3Pbbbexfv16fvnLX1JdXc3y5cuZN28eGzdubHuUYeXKldxwww0sW7YsJQECGgpeRPq4WHsMp3ogT13/VNJqee6553jsscd4/vnnycqKffzb7du3s27dOpYuXUpdXR179+5l/vz5LFmyhDlz5nDTTTEMX58kGgpeRCQF9u/fzyOPPMKKFSsYMGBAXNtOnjyZW265hRkzZrB06VIKCgpYvXo1S5YsYeLEiUyYMIG77rqL7Oxs8vLyOH78eJL+ijPpdJaISAr86Ec/Yt++fcyePZuKioq4prBdsmQJjzzyCNOnT+ell15i5MiRTJw4kfnz51NWVsb1119PdnY2ALfffjt33nknZWVl1NfXJ+vPaRPTAIwAZvZtd/96kutJOA3AKHL+CTMAYypOZ/VWZzMAYzyTUl0Tb2EiIueK8zE8EkGns0REJLR4QqTbiaNEROT8E0+IbE1aFSIiabZr/u3smn97uss458QUIpE51r/U4fUlZqZrJCIi57m0zLEuInI+SMcAjK+++mpctw+frVhDZApwNXBP5PVwoH9SKhIR6YNSNQDjtGnTWLRoUSJKjkmst/hGm2Nd83iISK+3/5/+iZNv9jwAY8NbQZtYrovkXDyFkQ88EFcdZzMAY2VlJQ8++CB5eXm0trby5JNPsm3bNhYsWEBGRgYLFizg7rvvBmDNmjWsWbOGhx56CAh6QzfddBPLli1j5MiRPPfcc3G9d0/SNce6iMh55aKLLgIINQAjwPPPP8/KlSuZPj049O7Zs4cf/vCHjBo1ihtvvLEtRKLJzc1lw4YNzJgxg7179zJ69Ojwf0gnsYbISWAscAXwFlDj7v+esCpERJIk1h7DqR7I+Kf/I2m1hB2AEWDWrFltAQKQmZnJAw88wNChQ7u8vnLKwoXB0/jjx4+nsbEx/sK7Eetf8TNgN8F1kfuA/wKuTWglIiJ92NkMwAgwcODA014/9NBD/PSnPyUzM7PHXk3nbRMp1hAZ6e6fMbPV7v47M8tMWkUiIn1QxwEYARYtWnRWF8DnzZvH7NmzmTBhAs3NzTQ0NJCbm5uocmMW0wCMZvYEQeB8FPgpMNrd/zrJtSWEBmAUOf+EGYAxFaezeqtUDMD4f4FbgNsi3xN7eV9EJM3Ox/BIhFhDZBnwz8ANSaxFRCRh3B0zDfnXk1inA+lKrCFyEPiJu3d/C4CISC+Qm5tLdXU1RUVFCpJuuDvV1dVndS0l1hDZBKwxs/8C6iJvrr6fiPRKY8eOpaqqioMHD6a7lF4vNzeXsWPHht4+1hB5LfIFGhJeRHq57OxsSkpK0l3GeSGmEHH3HyW7EBEROfdoZkMREQlNISIiIqEpREREJDSFiIiIhKYQERGR0BQiIiISmkJERERCU4iIiEhoChEREQlNISIiIqEpREREJLS0hIiZ5ZrZcjPbYmZPWw9jNZvZV8zspcjyrWb2jpmtjXzlp6ZqERHpLF09kS8AVe7+YaAAuK6rhmY2Hrijw6oC4JvuXhH5OprcUkVEpCvpCpGZwG8iy6uBGd20/S5wf4fXBcA9ZvaqmX032gZmttjMNpnZJs0nICKSPOkKkSLgVA+iFiiM1sjMPg9sAd7osHozcB9QCnzKzIo7b+fuT7h7qbuXDhs2LIFli4hIR7FOSpVoh4BT1zLyI6+jmQOMA2YDk83sHoL53g+5e4uZVQHDgcrklisiItGkqyeyCpgVWZ4JvBytkbt/3t0rgNuAze7+PeBfgQozyyMImLdTUK+IiESRrhD5MTDGzLYCNcAqMysxs+/EsO0/Af8MrAW+5e6Hk1iniIh0w9w93TUkVWlpqW/atCndZYiInFPMbLO7l/bUTg8biohIaAoREREJTSEiIiKhKURERCQ0hYiIiISmEBERkdAUIiIiEppCREREQlOIiIhIaAoREREJTSEiIiKhKURERCQ0hYiIiISmEBERkdAUIiIiEppCREREQlOIiIhIaAoREREJTSEiIiKhKURERCQ0hYiIiISmEBERkdAUIiIiEppCREREQlOIiIhIaAoREREJTSEiIiKhKURERCQ0hYiIiISmEBERkdAUIiIiEppCREREQlOIiIhIaAoREREJTSEiIiKhKURERCS0tISImeWa2XIz22JmT5uZddGuzMyqzGxt5GtyrNuKiEjypasn8gWgyt0/DBQA13XRrgCccnIbAAAGtUlEQVT4vrtXRL62x7GtiIgkWbpCZCbwm8jyamBGF+0KgE+b2QYz+3mk1xHrtiIikmTpCpEi4GhkuRYo7KLdO8DfuXs5MAr4eCzbmtliM9tkZpsOHjyY0MJFRKRdukLkEJAfWc6PvI6mEnipw/LwWLZ19yfcvdTdS4cNG5agkkVEpLN0hcgqYFZkeSbwchftvgLcZmYZwGXA63FsKyIiSZauEPkxMMbMtgI1wCozKzGz73Rq9z1gIbAeeMbd34i2bQrrFhGRDrLS8abufhKY02n1TuC+Tu32AdfGsK2IiKSBHjYUEZHQFCIiIhKaQqQLC1csZOGKhekuQ/ow7WOSTKnav8zdk/4m6VRaWuqbNm2Ke7ufzLmMwXWtZA0YmISqRKD5eB2A9jFJiubjddQOzOBzy18Ptb2ZbXb30p7apeXC+rng8LBccpsa9AFJ0jTmBCcCtI9JMjTmZHB4WG7S30f7bxf+5qn4ey8iIucbXRMREZHQFCIiIhKaQkREREJTiIiISGgKERERCU0hIiIioSlEREQkNIWIiIiEphAREZHQ+vzYWWZ2ENgVcvOhdD11bzr11rqg99amuuKjuuLTF+sa7+49zi/e50PkbJjZplgGIEu13loX9N7aVFd8VFd8zue6dDpLRERCU4iIiEhoCpHuPZHuArrQW+uC3lub6oqP6orPeVuXromIiEho6omIiEhoChEREQlNIRJhZllmtszM/mBmT3bTLtfMlpvZFjN72gJnrEtwbT8ys1fM7DkzizobpZlda2ZrI1+7zewOMyszs6oO6ycnsq44ajujjt7wmUVrl+zPLMbPKx37WLaZPd9Dm5TvYzHWlY79q8e6Iu1SvX/F8nkldP9SiLSbC2xx948Bo8xsahftvgBUufuHgQLgui7WJYSZVQBZ7j4dGAzMitbO3de4e4W7VwBbgVcjtXz/1Hp3356ouuKprYs60v6ZddEuaZ9ZHJ9XqvexPGBzT78z1ftYrHV1UUPaP6807F+xfl4J3b8UIu1WAP8a+dfhEKC2i3Yzgd9EllcDM7pYlygHgO9Glnv872Vm/YGJ7r6VYGf4tJltMLOfJ/pfY3HUFq2O3vCZRWuXzM8s1rpSuo+5e727fwioiqV9qvaxOOpK6f4VR10p3b/iqCuh+5dCJMLd69z9BPAH4IC7v9dF0yLgaGS5FijsYl2i6nrb3TeY2aeAVmBlD5tcB6yKLL8D/J27lwOjgI8nqq44a4tWR9o/sy7aJe0zi+PzSuk+FkLK9rEYpXT/ilWq9684JHT/6vJc8fnGzIqAOuAqYLWZzXD3l6M0PQTkR5bzI68HRlmXyNo+CdwL3OzuzT00vxn4RWS5Eni9w/LwRNYVR23R6oj2Oaa6rjPamVm0WlNdV8r3sTildB+LQbQakrp/xSrV+1eMErp/qSfS7qvAre7eApwA8rpot4r2c9kzgZe7WJcQZjYS+Bowx92P9dDWgGsJuqMAXwFuM7MM4DLad95U1xatjrR/Zl20S9pnFsfnldJ9LB6p3sdilNL9K1ap3r/ikND9SyHS7nFgkZmtA6qBF82sxMy+06ndj4ExZrYVqCH48KOtS5Q7CLq9L0bu5ljURV0AZcAb7t4Qef09YCGwHnjG3d9IYF3x1Batjt7wmZ3RrotaU11XqvexM/SifSyWulK9f8VaV6r3r1jrSuj+pSfWRUQkNPVEREQkNIWIiIiEphAREZHQFCIiIhKaQkSkF4iMXzSli59NidwSKtLraMcU6R2+DlzYxc8ujPxcpNdRiIgkiJmtCbldCTDU3X8Vef0PZrbegpFfB0XWF0XaifQqChGR9JsPfB/AzK4CrgamEwwKujjS5vuRdiK9isbOEkkwM8sBlgLjgF3AAiATeAYYCrwJvOXu/xjZZIK7vxVZng382t3dzFYAlwK4+1vqiUhvpJ6ISOJ9kWBokI8BbxMMczEF2A18FJjUIUA6G0Ew7ATu/p67d5xgKNFD+YucNYWISOJdAqyLLK+LvN4DfAT4LfBYp/YNZjYgslxLMKIqZlZuZl+LLA8A6pNct0jcFCIiibeN4JoGke/bgOuBb7v7Ve7+407tVwCfiiz/geCUFgQTA50Kjk8BLyatYpGQFCIiiffvwKVm9gdgEsH1kVeBfzOz35rZMjO7rEP754C5ZlYYWX7HzDYAFcBTkfW3RH4m0qtoFF+RFDCzLxIMDX6SYL6a/+Puazr8vBi4xt3/I8q2dwC/c/edKSlWJA4KERERCU2ns0REJDSFiIiIhKYQERGR0BQiIiISmkJERERC+/8nvKzX5ikxYwAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x1a108d6f60>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "\n",
    "import matplotlib.pyplot as pyplot\n",
    "import seaborn as sns\n",
    "\n",
    "%matplotlib inline\n",
    "# plot CV误差曲线\n",
    "test_means = grid.cv_results_[ 'mean_test_score' ]\n",
    "test_stds = grid.cv_results_[ 'std_test_score' ]\n",
    "train_means = grid.cv_results_[ 'mean_train_score' ]\n",
    "train_stds = grid.cv_results_[ 'std_train_score' ]\n",
    "\n",
    "\n",
    "# plot results\n",
    "n_Cs = len(Cs)\n",
    "number_penaltys = len(penaltys)\n",
    "test_scores = np.array(test_means).reshape(n_Cs,number_penaltys)\n",
    "train_scores = np.array(train_means).reshape(n_Cs,number_penaltys)\n",
    "test_stds = np.array(test_stds).reshape(n_Cs,number_penaltys)\n",
    "train_stds = np.array(train_stds).reshape(n_Cs,number_penaltys)\n",
    "\n",
    "x_axis = np.log10(Cs)\n",
    "for i, value in enumerate(penaltys):\n",
    "    #pyplot.plot(log(Cs), test_scores[i], label= 'penalty:'   + str(value))\n",
    "    pyplot.errorbar(x_axis, test_scores[:,i], yerr=test_stds[:,i] ,label = penaltys[i] +' Test')\n",
    "    pyplot.errorbar(x_axis, train_scores[:,i], yerr=train_stds[:,i] ,label = penaltys[i] +' Train')\n",
    "    \n",
    "pyplot.legend()\n",
    "pyplot.xlabel( 'log(C)' )                                                                                                      \n",
    "pyplot.ylabel( 'neg-logloss' )\n",
    "pyplot.savefig('LogisticGridSearchCV_C.png' )\n",
    "\n",
    "pyplot.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lr_tuned = LogisticRegression(penalty=’l1’, C=0.01,  solver=’saga’, verbose=100, n_jobs=-1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Memmaping (shape=(100985, 23), dtype=float64) to new file /var/folders/25/tnk7cgq92pn1tswp_3lyp5xh0000gn/T/joblib_memmaping_pool_87376_4508808976/87376-112003591640-79ced4e4dd65cc858ae16d70fe4007ee.pkl\n",
      "Pickling array (shape=(100985,), dtype=int64).\n",
      "Pickling array (shape=(80787,), dtype=int64).\n",
      "Pickling array (shape=(20198,), dtype=int64).\n",
      "Memmaping (shape=(100985, 23), dtype=float64) to old file /var/folders/25/tnk7cgq92pn1tswp_3lyp5xh0000gn/T/joblib_memmaping_pool_87376_4508808976/87376-112003591640-79ced4e4dd65cc858ae16d70fe4007ee.pkl\n",
      "Pickling array (shape=(100985,), dtype=int64).\n",
      "Pickling array (shape=(80788,), dtype=int64).\n",
      "Pickling array (shape=(20197,), dtype=int64).\n",
      "Memmaping (shape=(100985, 23), dtype=float64) to old file /var/folders/25/tnk7cgq92pn1tswp_3lyp5xh0000gn/T/joblib_memmaping_pool_87376_4508808976/87376-112003591640-79ced4e4dd65cc858ae16d70fe4007ee.pkl\n",
      "Pickling array (shape=(100985,), dtype=int64).\n",
      "Pickling array (shape=(80788,), dtype=int64).\n",
      "Pickling array (shape=(20197,), dtype=int64).\n",
      "Memmaping (shape=(100985, 23), dtype=float64) to old file /var/folders/25/tnk7cgq92pn1tswp_3lyp5xh0000gn/T/joblib_memmaping_pool_87376_4508808976/87376-112003591640-79ced4e4dd65cc858ae16d70fe4007ee.pkl\n",
      "Pickling array (shape=(100985,), dtype=int64).\n",
      "Pickling array (shape=(80788,), dtype=int64).\n",
      "Pickling array (shape=(20197,), dtype=int64).\n",
      "Memmaping (shape=(100985, 23), dtype=float64) to old file /var/folders/25/tnk7cgq92pn1tswp_3lyp5xh0000gn/T/joblib_memmaping_pool_87376_4508808976/87376-112003591640-79ced4e4dd65cc858ae16d70fe4007ee.pkl\n",
      "Pickling array (shape=(100985,), dtype=int64).\n",
      "Pickling array (shape=(80789,), dtype=int64).\n",
      "Pickling array (shape=(20196,), dtype=int64).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/svm/base.py:898: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.\n",
      "  \"the number of iterations.\", ConvergenceWarning)\n",
      "/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/svm/base.py:898: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.\n",
      "  \"the number of iterations.\", ConvergenceWarning)\n",
      "/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/svm/base.py:898: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.\n",
      "  \"the number of iterations.\", ConvergenceWarning)\n",
      "/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/svm/base.py:898: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.\n",
      "  \"the number of iterations.\", ConvergenceWarning)\n",
      "/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/svm/base.py:898: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.\n",
      "  \"the number of iterations.\", ConvergenceWarning)\n",
      "Process ForkPoolWorker-28:\n",
      "Process ForkPoolWorker-29:\n",
      "Process ForkPoolWorker-27:\n",
      "Process ForkPoolWorker-30:\n",
      "Process ForkPoolWorker-31:\n",
      "Process ForkPoolWorker-32:\n",
      "Process ForkPoolWorker-34:\n",
      "Process ForkPoolWorker-33:\n",
      "Traceback (most recent call last):\n",
      "Traceback (most recent call last):\n",
      "Traceback (most recent call last):\n",
      "Traceback (most recent call last):\n",
      "Traceback (most recent call last):\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/process.py\", line 258, in _bootstrap\n",
      "    self.run()\n",
      "Traceback (most recent call last):\n",
      "Traceback (most recent call last):\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/process.py\", line 258, in _bootstrap\n",
      "    self.run()\n"
     ]
    },
    {
     "ename": "KeyboardInterrupt",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mKeyboardInterrupt\u001b[0m                         Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-20-994f42902834>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      7\u001b[0m lrcv_L2 = LogisticRegressionCV(Cs= Cs, cv = 5, scoring='neg_log_loss', \n\u001b[1;32m      8\u001b[0m                                penalty='l1',solver='liblinear', verbose=1000,n_jobs=-1)\n\u001b[0;32m----> 9\u001b[0;31m \u001b[0mlrcv_L2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX_train\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_train\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, X, y, sample_weight)\u001b[0m\n\u001b[1;32m   1685\u001b[0m                       \u001b[0msample_weight\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msample_weight\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1686\u001b[0m                       )\n\u001b[0;32m-> 1687\u001b[0;31m             \u001b[0;32mfor\u001b[0m \u001b[0mlabel\u001b[0m \u001b[0;32min\u001b[0m \u001b[0miter_encoded_labels\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   1688\u001b[0m             for train, test in folds)\n\u001b[1;32m   1689\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, iterable)\u001b[0m\n\u001b[1;32m    787\u001b[0m                 \u001b[0;31m# consumption.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    788\u001b[0m                 \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_iterating\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 789\u001b[0;31m             \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mretrieve\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    790\u001b[0m             \u001b[0;31m# Make sure that we get a last message telling us we are done\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    791\u001b[0m             \u001b[0melapsed_time\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_start_time\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py\u001b[0m in \u001b[0;36mretrieve\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    697\u001b[0m             \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    698\u001b[0m                 \u001b[0;32mif\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_backend\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'supports_timeout'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 699\u001b[0;31m                     \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_output\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mextend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mjob\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtimeout\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtimeout\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    700\u001b[0m                 \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    701\u001b[0m                     \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_output\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mextend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mjob\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/multiprocessing/pool.py\u001b[0m in \u001b[0;36mget\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m    636\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    637\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtimeout\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 638\u001b[0;31m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwait\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtimeout\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    639\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mready\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    640\u001b[0m             \u001b[0;32mraise\u001b[0m \u001b[0mTimeoutError\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/multiprocessing/pool.py\u001b[0m in \u001b[0;36mwait\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m    633\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    634\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0mwait\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtimeout\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 635\u001b[0;31m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_event\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwait\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtimeout\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    636\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    637\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtimeout\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/threading.py\u001b[0m in \u001b[0;36mwait\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m    549\u001b[0m             \u001b[0msignaled\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_flag\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    550\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0msignaled\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 551\u001b[0;31m                 \u001b[0msignaled\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_cond\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwait\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtimeout\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    552\u001b[0m             \u001b[0;32mreturn\u001b[0m \u001b[0msignaled\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    553\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/threading.py\u001b[0m in \u001b[0;36mwait\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m    293\u001b[0m         \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m    \u001b[0;31m# restore state no matter what (e.g., KeyboardInterrupt)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    294\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0mtimeout\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 295\u001b[0;31m                 \u001b[0mwaiter\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0macquire\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    296\u001b[0m                 \u001b[0mgotit\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    297\u001b[0m             \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mKeyboardInterrupt\u001b[0m: "
     ]
    }
   ],
   "source": [
    "from sklearn.linear_model import LogisticRegressionCV\n",
    "\n",
    "Cs = [0.01]\n",
    "\n",
    "# 大量样本（6W+）、高维度（93），L2正则 --> 缺省用lbfgs\n",
    "# LogisticRegressionCV比GridSearchCV快\n",
    "lrcv_L2 = LogisticRegressionCV(Cs= Cs, cv = 5, scoring='neg_log_loss', \n",
    "                               penalty='l1',solver='liblinear', verbose=1000,n_jobs=-1)\n",
    " "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lrcv_L2.scores_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Memmaping (shape=(100985, 23), dtype=float64) to new file /var/folders/25/tnk7cgq92pn1tswp_3lyp5xh0000gn/T/joblib_memmaping_pool_87376_111935766256/87376-111960081184-79ced4e4dd65cc858ae16d70fe4007ee.pkl\n",
      "Pickling array (shape=(100985,), dtype=int64).\n",
      "Pickling array (shape=(80787,), dtype=int64).\n",
      "Pickling array (shape=(20198,), dtype=int64).\n",
      "Memmaping (shape=(100985, 23), dtype=float64) to old file /var/folders/25/tnk7cgq92pn1tswp_3lyp5xh0000gn/T/joblib_memmaping_pool_87376_111935766256/87376-111960081184-79ced4e4dd65cc858ae16d70fe4007ee.pkl\n",
      "Pickling array (shape=(100985,), dtype=int64).\n",
      "Pickling array (shape=(80788,), dtype=int64).\n",
      "Pickling array (shape=(20197,), dtype=int64).\n",
      "Memmaping (shape=(100985, 23), dtype=float64) to old file /var/folders/25/tnk7cgq92pn1tswp_3lyp5xh0000gn/T/joblib_memmaping_pool_87376_111935766256/87376-111960081184-79ced4e4dd65cc858ae16d70fe4007ee.pkl\n",
      "Pickling array (shape=(100985,), dtype=int64).\n",
      "Pickling array (shape=(80788,), dtype=int64).\n",
      "Pickling array (shape=(20197,), dtype=int64).\n",
      "Memmaping (shape=(100985, 23), dtype=float64) to old file /var/folders/25/tnk7cgq92pn1tswp_3lyp5xh0000gn/T/joblib_memmaping_pool_87376_111935766256/87376-111960081184-79ced4e4dd65cc858ae16d70fe4007ee.pkl\n",
      "Pickling array (shape=(100985,), dtype=int64).\n",
      "Pickling array (shape=(80788,), dtype=int64).\n",
      "Pickling array (shape=(20197,), dtype=int64).\n",
      "Memmaping (shape=(100985, 23), dtype=float64) to old file /var/folders/25/tnk7cgq92pn1tswp_3lyp5xh0000gn/T/joblib_memmaping_pool_87376_111935766256/87376-111960081184-79ced4e4dd65cc858ae16d70fe4007ee.pkl\n",
      "Pickling array (shape=(100985,), dtype=int64).\n",
      "Pickling array (shape=(80789,), dtype=int64).\n",
      "Pickling array (shape=(20196,), dtype=int64).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Process ForkPoolWorker-6:\n",
      "Process ForkPoolWorker-8:\n",
      "Process ForkPoolWorker-13:\n",
      "Process ForkPoolWorker-10:\n",
      "Process ForkPoolWorker-9:\n",
      "Process ForkPoolWorker-12:\n",
      "Process ForkPoolWorker-7:\n",
      "Process ForkPoolWorker-11:\n",
      "Traceback (most recent call last):\n",
      "Traceback (most recent call last):\n",
      "Traceback (most recent call last):\n",
      "Traceback (most recent call last):\n",
      "Traceback (most recent call last):\n",
      "Traceback (most recent call last):\n",
      "Traceback (most recent call last):\n",
      "Traceback (most recent call last):\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/process.py\", line 258, in _bootstrap\n",
      "    self.run()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/process.py\", line 258, in _bootstrap\n",
      "    self.run()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/process.py\", line 258, in _bootstrap\n",
      "    self.run()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/process.py\", line 258, in _bootstrap\n",
      "    self.run()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/process.py\", line 258, in _bootstrap\n",
      "    self.run()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/process.py\", line 93, in run\n",
      "    self._target(*self._args, **self._kwargs)\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/process.py\", line 93, in run\n",
      "    self._target(*self._args, **self._kwargs)\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/process.py\", line 93, in run\n",
      "    self._target(*self._args, **self._kwargs)\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/process.py\", line 93, in run\n",
      "    self._target(*self._args, **self._kwargs)\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/pool.py\", line 108, in worker\n",
      "    task = get()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/pool.py\", line 108, in worker\n",
      "    task = get()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/pool.py\", line 108, in worker\n",
      "    task = get()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/pool.py\", line 108, in worker\n",
      "    task = get()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/pool.py\", line 360, in get\n",
      "    racquire()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/pool.py\", line 360, in get\n",
      "    racquire()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/process.py\", line 258, in _bootstrap\n",
      "    self.run()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/pool.py\", line 360, in get\n",
      "    racquire()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/pool.py\", line 360, in get\n",
      "    racquire()\n",
      "KeyboardInterrupt\n",
      "KeyboardInterrupt\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/process.py\", line 258, in _bootstrap\n",
      "    self.run()\n",
      "KeyboardInterrupt\n",
      "KeyboardInterrupt\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Memmaping (shape=(100985, 23), dtype=float64) to old file /var/folders/25/tnk7cgq92pn1tswp_3lyp5xh0000gn/T/joblib_memmaping_pool_87376_111935766256/87376-111960081184-79ced4e4dd65cc858ae16d70fe4007ee.pkl\n",
      "Pickling array (shape=(100985,), dtype=int64).\n",
      "Pickling array (shape=(80789,), dtype=int64).\n",
      "Pickling array (shape=(20196,), dtype=int64).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/process.py\", line 93, in run\n",
      "    self._target(*self._args, **self._kwargs)\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/process.py\", line 93, in run\n",
      "    self._target(*self._args, **self._kwargs)\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/process.py\", line 93, in run\n",
      "    self._target(*self._args, **self._kwargs)\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/process.py\", line 258, in _bootstrap\n",
      "    self.run()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/pool.py\", line 108, in worker\n",
      "    task = get()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/pool.py\", line 108, in worker\n",
      "    task = get()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/pool.py\", line 362, in get\n",
      "    return recv()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/process.py\", line 93, in run\n",
      "    self._target(*self._args, **self._kwargs)\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/connection.py\", line 250, in recv\n",
      "    buf = self._recv_bytes()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/pool.py\", line 108, in worker\n",
      "    task = get()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/pool.py\", line 108, in worker\n",
      "    task = get()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/pool.py\", line 360, in get\n",
      "    racquire()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/pool.py\", line 360, in get\n",
      "    racquire()\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/connection.py\", line 407, in _recv_bytes\n",
      "    buf = self._recv(4)\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/pool.py\", line 360, in get\n",
      "    racquire()\n",
      "KeyboardInterrupt\n",
      "KeyboardInterrupt\n",
      "  File \"/Users/feixie/anaconda3/lib/python3.6/multiprocessing/connection.py\", line 379, in _recv\n",
      "    chunk = read(handle, remaining)\n",
      "KeyboardInterrupt\n",
      "KeyboardInterrupt\n"
     ]
    },
    {
     "ename": "KeyboardInterrupt",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mKeyboardInterrupt\u001b[0m                         Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-16-7d42a344d487>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      5\u001b[0m lrcv_L2 = LogisticRegressionCV(Cs= Cs, cv = 5, scoring='neg_log_loss', \n\u001b[1;32m      6\u001b[0m                                penalty='l2', verbose=1000,n_jobs=-1)\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0mlrcv_L2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX_train\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_train\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      8\u001b[0m \u001b[0mlrcv_L2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mscores_\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, X, y, sample_weight)\u001b[0m\n\u001b[1;32m   1685\u001b[0m                       \u001b[0msample_weight\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msample_weight\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1686\u001b[0m                       )\n\u001b[0;32m-> 1687\u001b[0;31m             \u001b[0;32mfor\u001b[0m \u001b[0mlabel\u001b[0m \u001b[0;32min\u001b[0m \u001b[0miter_encoded_labels\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   1688\u001b[0m             for train, test in folds)\n\u001b[1;32m   1689\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, iterable)\u001b[0m\n\u001b[1;32m    787\u001b[0m                 \u001b[0;31m# consumption.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    788\u001b[0m                 \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_iterating\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 789\u001b[0;31m             \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mretrieve\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    790\u001b[0m             \u001b[0;31m# Make sure that we get a last message telling us we are done\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    791\u001b[0m             \u001b[0melapsed_time\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_start_time\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py\u001b[0m in \u001b[0;36mretrieve\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    697\u001b[0m             \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    698\u001b[0m                 \u001b[0;32mif\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_backend\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'supports_timeout'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 699\u001b[0;31m                     \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_output\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mextend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mjob\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtimeout\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtimeout\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    700\u001b[0m                 \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    701\u001b[0m                     \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_output\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mextend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mjob\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/multiprocessing/pool.py\u001b[0m in \u001b[0;36mget\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m    636\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    637\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtimeout\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 638\u001b[0;31m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwait\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtimeout\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    639\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mready\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    640\u001b[0m             \u001b[0;32mraise\u001b[0m \u001b[0mTimeoutError\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/multiprocessing/pool.py\u001b[0m in \u001b[0;36mwait\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m    633\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    634\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0mwait\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtimeout\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 635\u001b[0;31m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_event\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwait\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtimeout\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    636\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    637\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtimeout\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/threading.py\u001b[0m in \u001b[0;36mwait\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m    549\u001b[0m             \u001b[0msignaled\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_flag\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    550\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0msignaled\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 551\u001b[0;31m                 \u001b[0msignaled\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_cond\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwait\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtimeout\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    552\u001b[0m             \u001b[0;32mreturn\u001b[0m \u001b[0msignaled\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    553\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/anaconda3/lib/python3.6/threading.py\u001b[0m in \u001b[0;36mwait\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m    293\u001b[0m         \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m    \u001b[0;31m# restore state no matter what (e.g., KeyboardInterrupt)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    294\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0mtimeout\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 295\u001b[0;31m                 \u001b[0mwaiter\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0macquire\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    296\u001b[0m                 \u001b[0mgotit\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    297\u001b[0m             \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mKeyboardInterrupt\u001b[0m: "
     ]
    }
   ],
   "source": [
    "Cs = [0.01]\n",
    "\n",
    "# 大量样本（6W+）、高维度（93），L2正则 --> 缺省用lbfgs\n",
    "# LogisticRegressionCV比GridSearchCV快\n",
    "lrcv_L2 = LogisticRegressionCV(Cs= Cs, cv = 5, scoring='neg_log_loss', \n",
    "                               penalty='l2', verbose=1000,n_jobs=-1)\n",
    "lrcv_L2.fit(X_train, y_train) \n",
    "lrcv_L2.scores_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lrcv_L2.scores_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
